Issues with FS going read-only and bad drive

* Issues with FS going read-only and bad drive
@ 2020-01-16  3:39 Sabrina Cathey
  0 siblings, 0 replies; only message in thread
From: Sabrina Cathey @ 2020-01-16  3:39 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2494 bytes --]

Sent earlier but it hasn't shown up on the mailing list.  I know
greylisting has a delay but usually not 30 minutes as far as I know.

----

Up front required information

uname -a;btrfs --version;btrfs fi show;btrfs fi df /shizzle/
Linux babel.thegnomedev.com 5.3.8-arch1-1 #1 SMP PREEMPT @1572357769
x86_64 GNU/Linux
btrfs-progs v5.3.1
Label: 'shizzle'  uuid: 92b267f2-c8af-40eb-b433-e53e140ebd01
Total devices 10 FS bytes used 34.18TiB
devid    2 size 5.46TiB used 4.28TiB path /dev/sdb1
devid    3 size 5.46TiB used 4.28TiB path /dev/sdg1
devid    4 size 5.46TiB used 4.28TiB path /dev/sdh1
devid    5 size 5.46TiB used 4.28TiB path /dev/sdi1
devid    6 size 5.46TiB used 4.28TiB path /dev/sdj1
devid    7 size 5.46TiB used 4.28TiB path /dev/sdf1
devid    8 size 5.46TiB used 4.28TiB path /dev/sda1
devid    9 size 5.46TiB used 4.28TiB path /dev/sdd1
devid   10 size 5.46TiB used 4.28TiB path /dev/sde1
devid   11 size 5.46TiB used 4.28TiB path /dev/sdc1

Data, RAID6: total=34.18TiB, used=34.13TiB
System, RAID6: total=256.00MiB, used=1.73MiB
Metadata, RAID6: total=60.00GiB, used=54.65GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

----

dmesg output is over 100k and my understanding is that you have a size
limit so here is a pastebin: https://pastebin.com/d4BPRS6m

----

Story is that I found the server unresponsive and when I rebooted I
ended seeing a disk was missing https://i.imgur.com/iLgnNBM.jpg

I mucked about trying to figure out what to do.  I ended up rebooting
to see if I could see an issue in the drive controller BIOS and when I
got back into the OS things seemed okay at first.  It was mounted and
looked okay but then I noticed issues in dmesg related "parent transid
verify failed" errors.

It's late and I was grasping a straws and random googling.  I tried a
scrub and it failed and the filesystem went RO.  I retried a few
times, because insanity.

I tried btrfsck (default non-destructive) and it also bailed out
https://i.imgur.com/ZEq0RjU.jpg

Looking at btrfs device stats it looks like one of the devices
(/dev/sde) is bad - probably the one that was found missing initially.
I'm attaching the output of that command.  I'm way out of my depth
here - my thought is to use btrfs device delete /dev/sde1

Please can you help me to not lose my data?  With this large an amount
of data, I have yet to invest in another set of disks for backup (I
know that RAID isn't backups and I should have them).

Any help would be most appreciated

Thanks

Sabrina

[-- Attachment #2: btrfs.device.stats.shizzle.txt --]
[-- Type: text/plain, Size: 1558 bytes --]

[/dev/sdb1].write_io_errs    0
[/dev/sdb1].read_io_errs     0
[/dev/sdb1].flush_io_errs    0
[/dev/sdb1].corruption_errs  2
[/dev/sdb1].generation_errs  0
[/dev/sdg1].write_io_errs    0
[/dev/sdg1].read_io_errs     0
[/dev/sdg1].flush_io_errs    0
[/dev/sdg1].corruption_errs  0
[/dev/sdg1].generation_errs  0
[/dev/sdh1].write_io_errs    0
[/dev/sdh1].read_io_errs     0
[/dev/sdh1].flush_io_errs    0
[/dev/sdh1].corruption_errs  0
[/dev/sdh1].generation_errs  0
[/dev/sdi1].write_io_errs    0
[/dev/sdi1].read_io_errs     0
[/dev/sdi1].flush_io_errs    0
[/dev/sdi1].corruption_errs  4
[/dev/sdi1].generation_errs  0
[/dev/sdj1].write_io_errs    0
[/dev/sdj1].read_io_errs     0
[/dev/sdj1].flush_io_errs    0
[/dev/sdj1].corruption_errs  3
[/dev/sdj1].generation_errs  0
[/dev/sdf1].write_io_errs    0
[/dev/sdf1].read_io_errs     0
[/dev/sdf1].flush_io_errs    0
[/dev/sdf1].corruption_errs  0
[/dev/sdf1].generation_errs  0
[/dev/sda1].write_io_errs    0
[/dev/sda1].read_io_errs     0
[/dev/sda1].flush_io_errs    0
[/dev/sda1].corruption_errs  0
[/dev/sda1].generation_errs  0
[/dev/sdd1].write_io_errs    0
[/dev/sdd1].read_io_errs     0
[/dev/sdd1].flush_io_errs    0
[/dev/sdd1].corruption_errs  0
[/dev/sdd1].generation_errs  0
[/dev/sde1].write_io_errs    6075
[/dev/sde1].read_io_errs     5965
[/dev/sde1].flush_io_errs    184
[/dev/sde1].corruption_errs  0
[/dev/sde1].generation_errs  0
[/dev/sdc1].write_io_errs    0
[/dev/sdc1].read_io_errs     0
[/dev/sdc1].flush_io_errs    0
[/dev/sdc1].corruption_errs  0
[/dev/sdc1].generation_errs  0

^ permalink raw reply	[flat|nested] only message in thread