All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steven Haigh <netwiz@crc.id.au>
To: linux-btrfs@vger.kernel.org
Subject: Re: Trying to rescue my data :(
Date: Sat, 25 Jun 2016 02:26:48 +1000	[thread overview]
Message-ID: <4e1bff4c-3cfc-4391-d093-8293bf29e795@crc.id.au> (raw)
In-Reply-To: <15415597-7f29-396e-8425-8cbbeb32e897@crc.id.au>


[-- Attachment #1.1: Type: text/plain, Size: 13359 bytes --]

On 25/06/16 00:52, Steven Haigh wrote:
> Ok, so I figured that despite what the BTRFS wiki seems to imply, the
> 'multi parity' support just isn't stable enough to be used. So, I'm
> trying to revert to what I had before.
> 
> My setup consist of:
> 	* 2 x 3Tb drives +
> 	* 3 x 2Tb drives.
> 
> I've got (had?) about 4.9Tb of data.
> 
> My idea was to convert the existing setup using a balance to a 'single'
> setup, delete the 3 x 2Tb drives from the BTRFS system, then create a
> new mdadm based RAID6 (5 drives degraded to 3), create a new filesystem
> on that, then copy the data across.
> 
> So, great - first the balance:
> $ btrfs balance start -dconvert=single -mconvert=single -f (yes, I know
> it'll reduce the metadata redundancy).
> 
> This promptly was followed by a system crash.
> 
> After a reboot, I can no longer mount the BTRFS in read-write:
> [  134.768908] BTRFS info (device xvdd): disk space caching is enabled
> [  134.769032] BTRFS: has skinny extents
> [  134.769856] BTRFS: failed to read the system array on xvdd
> [  134.776055] BTRFS: open_ctree failed
> [  143.900055] BTRFS info (device xvdd): allowing degraded mounts
> [  143.900152] BTRFS info (device xvdd): not using ssd allocation scheme
> [  143.900243] BTRFS info (device xvdd): disk space caching is enabled
> [  143.900330] BTRFS: has skinny extents
> [  143.901860] BTRFS warning (device xvdd): devid 4 uuid
> 61ccce61-9787-453e-b793-1b86f8015ee1 is missing
> [  146.539467] BTRFS: missing devices(1) exceeds the limit(0), writeable
> mount is not allowed
> [  146.552051] BTRFS: open_ctree failed
> 
> I can mount it read only - but then I also get crashes when it seems to
> hit a read error:
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064
> csum 3245290974 wanted 982056704 mirror 0
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 390821102 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 550556475 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 1279883714 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 2566472073 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 1876236691 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 3350537857 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 3319706190 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 2377458007 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 2066127208 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 657140479 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 1239359620 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 1598877324 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 1082738394 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 371906697 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 2156787247 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 3777709399 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 180814340 wanted 982056704 mirror 1
> ------------[ cut here ]------------
> kernel BUG at fs/btrfs/extent_io.c:2401!
> invalid opcode: 0000 [#1] SMP
> Modules linked in: btrfs x86_pkg_temp_thermal coretemp crct10dif_pclmul
> xor aesni_intel aes_x86_64 lrw gf128mul glue_helper pcspkr raid6_pq
> ablk_helper cryptd nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables
> xen_netfront crc32c_intel xen_gntalloc xen_evtchn ipv6 autofs4
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 2610978113 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 59610051 wanted 982056704 mirror 1
> CPU: 1 PID: 1273 Comm: kworker/u4:4 Not tainted 4.4.13-1.el7xen.x86_64 #1
> Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
> task: ffff880079ce12c0 ti: ffff880078788000 task.ti: ffff880078788000
> RIP: e030:[<ffffffffa039e0e0>]  [<ffffffffa039e0e0>]
> btrfs_check_repairable+0x100/0x110 [btrfs]
> RSP: e02b:ffff88007878bcc8  EFLAGS: 00010297
> RAX: 0000000000000001 RBX: ffff880079db2080 RCX: 0000000000000003
> RDX: 0000000000000003 RSI: 000004db13730000 RDI: ffff88007889ef38
> RBP: ffff88007878bce0 R08: 000004db01c00000 R09: 000004dbc1c00000
> R10: ffff88006bb0c1b8 R11: 0000000000000000 R12: 0000000000000000
> R13: ffff88007b213ea8 R14: 0000000000001000 R15: 0000000000000000
> FS:  00007fbf2fdc0880(0000) GS:ffff88007f500000(0000) knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fbf2d96702b CR3: 000000007969f000 CR4: 0000000000042660
> Stack:
>  ffffea00019db180 0000000000010000 ffff88007b213f30 ffff88007878bd88
>  ffffffffa03a0808 ffff880002d15500 ffff88007878bd18 ffff880079ce12c0
>  ffff88007b213e40 000000000000001f ffff880000000000 ffff88006bb0c048
> Call Trace:
>  [<ffffffffa03a0808>] end_bio_extent_readpage+0x428/0x560 [btrfs]
>  [<ffffffff812f40c0>] bio_endio+0x40/0x60
>  [<ffffffffa0375a6c>] end_workqueue_fn+0x3c/0x40 [btrfs]
>  [<ffffffffa03af3f1>] normal_work_helper+0xc1/0x300 [btrfs]
>  [<ffffffff810a1352>] ? finish_task_switch+0x82/0x280
>  [<ffffffffa03af702>] btrfs_endio_helper+0x12/0x20 [btrfs]
>  [<ffffffff81093844>] process_one_work+0x154/0x400
>  [<ffffffff8109438a>] worker_thread+0x11a/0x460
>  [<ffffffff8165a24f>] ? __schedule+0x2bf/0x880
>  [<ffffffff81094270>] ? rescuer_thread+0x2f0/0x2f0
>  [<ffffffff810993f9>] kthread+0xc9/0xe0
>  [<ffffffff81099330>] ? kthread_park+0x60/0x60
>  [<ffffffff8165e14f>] ret_from_fork+0x3f/0x70
>  [<ffffffff81099330>] ? kthread_park+0x60/0x60
> Code: 00 31 c0 eb d5 8d 48 02 eb d9 31 c0 45 89 e0 48 c7 c6 a0 f8 3f a0
> 48 c7 c7 00 05 41 a0 e8 c9 f2 fa e0 31 c0 e9 70 ff ff ff 0f 0b <0f> 0b
> 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
> RIP  [<ffffffffa039e0e0>] btrfs_check_repairable+0x100/0x110 [btrfs]
>  RSP <ffff88007878bcc8>
> ------------[ cut here ]------------
> <more crashes until the system hangs>
> 
> So, where to from here? Sadly, I feel there is data loss in my future,
> but not sure how to minimise this :\
> 

The more I look at this, the more I'm wondering if this is a total
corruption scenario:

$ btrfs restore -D -l /dev/xvdc
warning, device 4 is missing
checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
bytenr mismatch, want=11224137433088, have=11224137564160
Couldn't read chunk tree
Could not open root, trying backup super
warning, device 2 is missing
warning, device 4 is missing
warning, device 5 is missing
warning, device 3 is missing
checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
bytenr mismatch, want=11224137433088, have=59973363410688
Couldn't read chunk tree
Could not open root, trying backup super
warning, device 2 is missing
warning, device 4 is missing
warning, device 5 is missing
warning, device 3 is missing
checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
bytenr mismatch, want=11224137433088, have=59973363410688
Couldn't read chunk tree
Could not open root, trying backup super

$ btrfs restore -D -l /dev/xvdd
warning, device 4 is missing
checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
bytenr mismatch, want=11224137433088, have=11224137564160
Couldn't read chunk tree
Could not open root, trying backup super
warning, device 1 is missing
warning, device 4 is missing
warning, device 5 is missing
warning, device 3 is missing
bytenr mismatch, want=11224137170944, have=0
ERROR: cannot read chunk root
Could not open root, trying backup super
warning, device 1 is missing
warning, device 4 is missing
warning, device 5 is missing
warning, device 3 is missing
bytenr mismatch, want=11224137170944, have=0
ERROR: cannot read chunk root
Could not open root, trying backup super

$ btrfs restore -D -l /dev/xvde
warning, device 4 is missing
checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
bytenr mismatch, want=11224137433088, have=11224137564160
Couldn't read chunk tree
Could not open root, trying backup super
warning, device 1 is missing
warning, device 2 is missing
warning, device 4 is missing
warning, device 5 is missing
checksum verify failed on 11224137170944 found C9115A93 wanted 14526E28
checksum verify failed on 11224137170944 found C9115A93 wanted 14526E28
bytenr mismatch, want=11224137170944, have=59973365311232
ERROR: cannot read chunk root
Could not open root, trying backup super
warning, device 1 is missing
warning, device 2 is missing
warning, device 4 is missing
warning, device 5 is missing
checksum verify failed on 11224137170944 found C9115A93 wanted 14526E28
checksum verify failed on 11224137170944 found C9115A93 wanted 14526E28
bytenr mismatch, want=11224137170944, have=59973365311232
ERROR: cannot read chunk root
Could not open root, trying backup super

$ btrfs restore -D -l /dev/xvdf
warning, device 4 is missing
checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
bytenr mismatch, want=11224137433088, have=11224137564160
Couldn't read chunk tree
Could not open root, trying backup super
warning, device 1 is missing
warning, device 2 is missing
warning, device 4 is missing
warning, device 5 is missing
warning, device 3 is missing
bytenr mismatch, want=11224137170944, have=0
ERROR: cannot read chunk root
Could not open root, trying backup super
warning, device 1 is missing
warning, device 2 is missing
warning, device 4 is missing
warning, device 5 is missing
warning, device 3 is missing
bytenr mismatch, want=11224137170944, have=0
ERROR: cannot read chunk root
Could not open root, trying backup super

$ btrfs restore -D -l /dev/xvdg
warning, device 4 is missing
checksum verify failed on 11224137433088 found EF5DE164 wanted 62BE2322
bytenr mismatch, want=11224137433088, have=11224137564160
Couldn't read chunk tree
Could not open root, trying backup super
warning, device 1 is missing
warning, device 2 is missing
warning, device 4 is missing
warning, device 3 is missing
bytenr mismatch, want=11224137170944, have=11224137105408
ERROR: cannot read chunk root
Could not open root, trying backup super
warning, device 1 is missing
warning, device 2 is missing
warning, device 4 is missing
warning, device 3 is missing
bytenr mismatch, want=11224137170944, have=11224137105408
ERROR: cannot read chunk root
Could not open root, trying backup super

If I mount it read only:
$ mount -o nossd,degraded,ro /dev/xvdc /mnt/fileshare/

$ btrfs device usage /mnt/fileshare/

/dev/xvdc, ID: 1
   Device size:             2.73TiB
   Device slack:              0.00B
   Data,single:             5.00GiB
   Data,RAID6:              1.60TiB
   Data,RAID6:              2.75GiB
   Data,RAID6:              1.00GiB
   Metadata,RAID6:          2.06GiB
   System,RAID6:           32.00MiB
   Unallocated:             1.12TiB

/dev/xvdd, ID: 2
   Device size:             2.73TiB
   Device slack:              0.00B
   Data,single:             1.00GiB
   Data,RAID6:              1.60TiB
   Data,RAID6:              7.07GiB
   Data,RAID6:              1.00GiB
   Metadata,RAID6:          2.06GiB
   System,RAID6:           32.00MiB
   Unallocated:             1.12TiB

/dev/xvde, ID: 3
   Device size:             1.82TiB
   Device slack:              0.00B
   Data,RAID6:              1.60TiB
   Data,RAID6:              7.07GiB
   Metadata,RAID6:          2.06GiB
   System,RAID6:           32.00MiB
   Unallocated:           213.23GiB

/dev/xvdf, ID: 6
   Device size:             1.82TiB
   Device slack:              0.00B
   Data,RAID6:            882.62GiB
   Data,RAID6:              1.00GiB
   Metadata,RAID6:          2.06GiB
   Unallocated:           977.33GiB

/dev/xvdg, ID: 5
   Device size:             1.82TiB
   Device slack:              0.00B
   Data,RAID6:              1.60TiB
   Data,RAID6:              7.07GiB
   Metadata,RAID6:          2.06GiB
   System,RAID6:           32.00MiB
   Unallocated:           213.23GiB

missing, ID: 4
   Device size:               0.00B
   Device slack:           16.00EiB
   Data,RAID6:            758.00GiB
   Data,RAID6:              4.31GiB
   System,RAID6:           32.00MiB
   Unallocated:             1.07TiB

Hoping this isn't a total loss ;)

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  reply	other threads:[~2016-06-24 16:27 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-24 14:52 Trying to rescue my data :( Steven Haigh
2016-06-24 16:26 ` Steven Haigh [this message]
2016-06-24 16:59   ` ronnie sahlberg
2016-06-24 17:05     ` Steven Haigh
2016-06-24 17:40       ` Austin S. Hemmelgarn
2016-06-24 17:43         ` Steven Haigh
2016-06-24 17:50           ` Austin S. Hemmelgarn
2016-06-25  4:19             ` Steven Haigh
2016-06-25 16:25               ` Chris Murphy
2016-06-25 16:39                 ` Steven Haigh
2016-06-25 17:14                   ` Chris Murphy
2016-06-26  2:30                   ` Duncan
2016-06-26  3:13                     ` Steven Haigh
2016-09-11 19:48                       ` compress=lzo safe to use? (was: Re: Trying to rescue my data :() Martin Steigerwald
2016-09-11 20:06                         ` Adam Borowski
2016-09-11 20:27                           ` Chris Murphy
2016-09-11 20:49                         ` compress=lzo safe to use? Hans van Kranenburg
2016-09-12  4:36                           ` Duncan
2016-09-17  9:30                             ` Kai Krakow
2016-09-12  1:00                         ` Steven Haigh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4e1bff4c-3cfc-4391-d093-8293bf29e795@crc.id.au \
    --to=netwiz@crc.id.au \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.