All of lore.kernel.org
 help / color / mirror / Atom feed
* Detailed RAID Status and Errors
@ 2014-02-26  1:27 Justin Brown
  2014-02-26  2:08 ` Chris Murphy
  0 siblings, 1 reply; 4+ messages in thread
From: Justin Brown @ 2014-02-26  1:27 UTC (permalink / raw)
  To: linux-btrfs

Hello,

I'm finishing up my data migration to Btrfs, and I've run into an
error that I'm trying to explore in more detail. I'm using Fedora 20
with Btrfs v0.20-rc1.

My array is a 5 disk (4x 1TB and 1x 2TB) RAID 6 (-d raid6 -m raid6). I
completed my rsync to this array, and I figured that it would be
prudent to run a scrub before I consider this array the canonical
version of my data. The scrub is still running, but I current have the
following status:

~$ btrfs scrub status t
scrub status for 7b7afc82-f77c-44c0-b315-669ebd82f0c5
scrub started at Mon Feb 24 20:10:54 2014, running for 86080 seconds
total bytes scrubbed: 2.71TiB with 1 errors
error details: read=1
corrected errors: 0, uncorrectable errors: 1, unverified errors: 0

It is accompied by the following messages in the journal:

Feb 25 15:16:24 localhost kernel: ata4.00: exception Emask 0x0 SAct
0x3f SErr 0x0 action 0x0
Feb 25 15:16:24 localhost kernel: ata4.00: irq_stat 0x40000008
Feb 25 15:16:24 localhost kernel: ata4.00: failed command: READ FPDMA QUEUED
Feb 25 15:16:24 localhost kernel: ata4.00: cmd
60/08:08:b8:24:af/00:00:58:00:00/40 tag 1 ncq 4096 in
                                           res
41/40:00:be:24:af/00:00:58:00:00/40 Emask 0x409 (media error) <F>
Feb 25 15:16:24 localhost kernel: ata4.00: status: { DRDY ERR }
Feb 25 15:16:24 localhost kernel: ata4.00: error: { UNC }
Feb 25 15:16:24 localhost kernel: ata4.00: configured for UDMA/133
Feb 25 15:16:24 localhost kernel: sd 3:0:0:0: [sdd] Unhandled sense code
Feb 25 15:16:24 localhost kernel: sd 3:0:0:0: [sdd]
Feb 25 15:16:24 localhost kernel: Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
Feb 25 15:16:24 localhost kernel: sd 3:0:0:0: [sdd]
Feb 25 15:16:24 localhost kernel: Sense Key : Medium Error [current]
[descriptor]
Feb 25 15:16:24 localhost kernel: Descriptor sense data with sense
descriptors (in hex):
Feb 25 15:16:24 localhost kernel:         72 03 11 04 00 00 00 0c 00
0a 80 00 00 00 00 00
Feb 25 15:16:24 localhost kernel:         58 af 24 be
Feb 25 15:16:24 localhost kernel: sd 3:0:0:0: [sdd]
Feb 25 15:16:24 localhost kernel: Add. Sense: Unrecovered read error -
auto reallocate failed
Feb 25 15:16:24 localhost kernel: sd 3:0:0:0: [sdd] CDB:
Feb 25 15:16:24 localhost kernel: Read(10): 28 00 58 af 24 b8 00 00 08 00
Feb 25 15:16:24 localhost kernel: end_request: I/O error, dev sdd,
sector 1487873214
Feb 25 15:16:24 localhost kernel: ata4: EH complete
Feb 25 15:16:24 localhost kernel: btrfs: i/o error at logical
2285387870208 on dev /dev/sdf1, sector 1488392888, root 5, inode
357715, offset 48787456, length 4096, links 1 (path:
PATH/TO/REDACTED_FILE)
Feb 25 15:16:24 localhost kernel: btrfs: bdev /dev/sdf1 errs: wr 0, rd
1, flush 0, corrupt 0, gen 0
Feb 25 15:16:24 localhost kernel: btrfs: unable to fixup (regular)
error at logical 2285387870208 on dev /dev/sdf1

I have a few questions:

* How is "total bytes scrubbed" determined? This array only has 2.2TB
of space used, so I'm confused about how many total bytes need to be
scrubbed before it is finished.

* What is the best way to recover from this error? If I delete
PATH/TO/REDACTED_FILE and recopy it, will everything be okay? (I found
a thread on the Arch Linux forums,
https://bbs.archlinux.org/viewtopic.php?id=170795, that mentions this
as a solution, but I can't tell if it's the proper method.

* Should I run another scrub? (I'd like to avoid another scrub if
possible because the scrub has been running for 24 hours already.)

* When a scrub is not running, is there any `btrfs` command that will
show me corrected and uncorrectable errors that occur during normal
operation? I guess something similar to `mdadm -D`.

* It seems like this type of error shouldn't happen on RAID6 as there
should be enough information to recover between the data, p parity,
and q parity. Is this just an implementation limitation of the current
RAID 5/6 code?

Thanks,
Justin

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Detailed RAID Status and Errors
  2014-02-26  1:27 Detailed RAID Status and Errors Justin Brown
@ 2014-02-26  2:08 ` Chris Murphy
  2014-02-26  6:19   ` Justin Brown
  0 siblings, 1 reply; 4+ messages in thread
From: Chris Murphy @ 2014-02-26  2:08 UTC (permalink / raw)
  To: Justin Brown; +Cc: linux-btrfs


On Feb 25, 2014, at 6:27 PM, Justin Brown <justin.brown@fandingo.org> wrote:

> Hello,
> 
> I'm finishing up my data migration to Btrfs, and I've run into an
> error that I'm trying to explore in more detail. I'm using Fedora 20
> with Btrfs v0.20-rc1.

You should have btrfs-progs-3.12-1.fc20.x86_64, it's available since November.


> I
> completed my rsync to this array, and I figured that it would be
> prudent to run a scrub before I consider this array the canonical
> version of my data.

Scrub can't fix problems with raid5/6 yet.
http://permalink.gmane.org/gmane.comp.file-systems.btrfs/31938


> 
> total bytes scrubbed: 2.71TiB with 1 errors
> 
> * How is "total bytes scrubbed" determined? This array only has 2.2TB
> of space used, so I'm confused about how many total bytes need to be
> scrubbed before it is finished.

Total includes metadata.


> Feb 25 15:16:24 localhost kernel: ata4.00: exception Emask 0x0 SAct
> 0x3f SErr 0x0 action 0x0
> Feb 25 15:16:24 localhost kernel: ata4.00: irq_stat 0x40000008
> Feb 25 15:16:24 localhost kernel: ata4.00: failed command: READ FPDMA QUEUED
> Feb 25 15:16:24 localhost kernel: ata4.00: cmd
> 60/08:08:b8:24:af/00:00:58:00:00/40 tag 1 ncq 4096 in
>                                           res
> 41/40:00:be:24:af/00:00:58:00:00/40 Emask 0x409 (media error) <F>
> Feb 25 15:16:24 localhost kernel: ata4.00: status: { DRDY ERR }
> Feb 25 15:16:24 localhost kernel: ata4.00: error: { UNC }
> Feb 25 15:16:24 localhost kernel: ata4.00: configured for UDMA/133
> Feb 25 15:16:24 localhost kernel: sd 3:0:0:0: [sdd] Unhandled sense code
> Feb 25 15:16:24 localhost kernel: sd 3:0:0:0: [sdd]
> Feb 25 15:16:24 localhost kernel: Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> Feb 25 15:16:24 localhost kernel: sd 3:0:0:0: [sdd]
> Feb 25 15:16:24 localhost kernel: Sense Key : Medium Error [current]
> [descriptor]
> Feb 25 15:16:24 localhost kernel: Descriptor sense data with sense
> descriptors (in hex):
> Feb 25 15:16:24 localhost kernel:         72 03 11 04 00 00 00 0c 00
> 0a 80 00 00 00 00 00
> Feb 25 15:16:24 localhost kernel:         58 af 24 be
> Feb 25 15:16:24 localhost kernel: sd 3:0:0:0: [sdd]

All of this looks like a conventional bad sector read error. It's concerning why there'd be a bad sector after having just written to it when putting all your data on this volume. What do you get for:

smartctl -x /dev/sdd
smartctl -l scterc /dev/sdd


> Feb 25 15:16:24 localhost kernel: Add. Sense: Unrecovered read error -
> auto reallocate failed

Also not reassuring.


> * What is the best way to recover from this error? If I delete
> PATH/TO/REDACTED_FILE and recopy it, will everything be okay? (I found
> a thread on the Arch Linux forums,
> https://bbs.archlinux.org/viewtopic.php?id=170795, that mentions this
> as a solution, but I can't tell if it's the proper method.
> 
> * Should I run another scrub? (I'd like to avoid another scrub if
> possible because the scrub has been running for 24 hours already.)

No, balance in this case due to the present scrub on radi5/6 limitation.


> 
> * When a scrub is not running, is there any `btrfs` command that will
> show me corrected and uncorrectable errors that occur during normal
> operation? I guess something similar to `mdadm -D`.

btrfs device stats /dev/X


> 
> * It seems like this type of error shouldn't happen on RAID6 as there
> should be enough information to recover between the data, p parity,
> and q parity. Is this just an implementation limitation of the current
> RAID 5/6 code?


The first problem is a device error on /dev/sdd reported to libata, which is the bulk of what you posted. However this:

> kernel: btrfs: i/o error at logical 2285387870208 on dev /dev/sdf1, sector 1488392888, root 5, inode 357715, offset 48787456, length 4096, links 1 (path: PATH/TO/REDACTED_FILE)
> kernel: btrfs: bdev /dev/sdf1 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
> kernel: btrfs: unable to fixup (regular) error at logical 2285387870208 on dev /dev/sdf1

Is a bit confusing to me because it's a different drive. First, sdd itself reported a read error. Then btrfs detects an i/o error (?) on /dev/sdf. That's unexpected, although the fact it can't fix the error is expected with the current raid5/raid6 support. What I can't tell is if the two errors affected the same stripe but by the looks of it the data itself is OK.

The hardware problems need to be addressed for sure especially because while btrfs raid5/6 will reconstruct data from parity, it doesn't yet write reconstructed data back to bad sectors yet.


Chris Murphy

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Detailed RAID Status and Errors
  2014-02-26  2:08 ` Chris Murphy
@ 2014-02-26  6:19   ` Justin Brown
  2014-02-26  6:42     ` Chris Murphy
  0 siblings, 1 reply; 4+ messages in thread
From: Justin Brown @ 2014-02-26  6:19 UTC (permalink / raw)
  To: linux-btrfs

Chris,

Thanks for the reply.

> Total includes metadata.

It still doesn't seem to add up:

~$ btrfs fi df t
Data, single: total=8.00MiB, used=0.00
Data, RAID6: total=2.17TiB, used=2.17TiB
System, single: total=4.00MiB, used=0.00
System, RAID6: total=9.56MiB, used=192.00KiB
Metadata, single: total=8.00MiB, used=0.00
Metadata, RAID6: total=4.03GiB, used=3.07GiB

Nonetheless, the scrub finished shortly after I started typing this
response. Total was ~2.7TB if I remember correctly.

> All of this looks like a conventional bad sector read error. It's concerning why there'd be a bad sector after having just written to it when putting all your data on this volume. What do you get for:

> smartctl -x /dev/sdd

...
SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x000a  2            1  Device-to-host register FISes sent due to a COMRESET
0x8000  4       185377  Vendor specific




> smartctl -l scterc /dev/sdd

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

> btrfs device stats /dev/X

All drives except /dev/sdf1 have zeroes for all values. /dev/sdf1
reports that same read error from the logs:

[/dev/sdf1].write_io_errs   0
[/dev/sdf1].read_io_errs    1
[/dev/sdf1].flush_io_errs   0
[/dev/sdf1].corruption_errs 0
[/dev/sdf1].generation_errs 0

> You should have btrfs-progs-3.12-1.fc20.x86_64, it's available since November.

I was performing this copy from live media and forgot to run updates
before starting. I've updated btrfs-progs to the newest version, which
matches what you provided. I'm running another scrub with the updated
btrfs-progs, and while it hasn't uncovered any errors yet, the scrub
won't be finished until tomorrow. (I know that won't do anything
besides indicate some kind of transient error or a problem with
0.20-rc1.) After that, I'll give balance a try to attempt to fix the
issue.

Thanks,
Justin

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Detailed RAID Status and Errors
  2014-02-26  6:19   ` Justin Brown
@ 2014-02-26  6:42     ` Chris Murphy
  0 siblings, 0 replies; 4+ messages in thread
From: Chris Murphy @ 2014-02-26  6:42 UTC (permalink / raw)
  To: Justin Brown; +Cc: linux-btrfs


On Feb 25, 2014, at 11:19 PM, Justin Brown <justin.brown@fandingo.org> wrote:

> Chris,
> 
> Thanks for the reply.
> 
>> Total includes metadata.
> 
> It still doesn't seem to add up:
> 
> ~$ btrfs fi df t
> Data, single: total=8.00MiB, used=0.00
> Data, RAID6: total=2.17TiB, used=2.17TiB
> System, single: total=4.00MiB, used=0.00
> System, RAID6: total=9.56MiB, used=192.00KiB
> Metadata, single: total=8.00MiB, used=0.00
> Metadata, RAID6: total=4.03GiB, used=3.07GiB
> 
> Nonetheless, the scrub finished shortly after I started typing this
> response. Total was ~2.7TB if I remember correctly.

What do you get for btfs fi show

> 
>> All of this looks like a conventional bad sector read error. It's concerning why there'd be a bad sector after having just written to it when putting all your data on this volume. What do you get for:
> 
>> smartctl -x /dev/sdd
> 
> ...
> SATA Phy Event Counters (GP Log 0x11)
> ID      Size     Value  Description
> 0x0001  2            0  Command failed due to ICRC error
> 0x0002  2            0  R_ERR response for data FIS
> 0x0003  2            0  R_ERR response for device-to-host data FIS
> 0x0004  2            0  R_ERR response for host-to-device data FIS
> 0x0005  2            0  R_ERR response for non-data FIS
> 0x0006  2            0  R_ERR response for device-to-host non-data FIS
> 0x0007  2            0  R_ERR response for host-to-device non-data FIS
> 0x000a  2            1  Device-to-host register FISes sent due to a COMRESET
> 0x8000  4       185377  Vendor specific

You chopped out the important part. Post the whole thing.


>> smartctl -l scterc /dev/sdd
> 
> SCT Error Recovery Control:
>           Read: Disabled
>          Write: Disabled

It's possible for the drive recovery to take longer when reading from a troublesome sector than the SCSI command timer value, which is 30 seconds by default. This is a kernel function. You can check it with cat or change it with echo value > to /sys/block/<device-name>/device/timeout

You'd have to consult the model spec to find out what the drive's time out is, but you want the kernel to wait at least say, a second, longer than the drive. So if the drive waits up to 120 seconds, then have the kernel wait 121 seconds. Otherwise what happens is you get a reset instead of this:

end_request: I/O error, dev sdd, sector 1487873214

That's important because it's how to know what to write good data back to (once supported). If a reset happens first, this information is lost. So it's not related to this problem but you'll want to change the command timer value.

> 
>> btrfs device stats /dev/X
> 
> All drives except /dev/sdf1 have zeroes for all values. /dev/sdf1
> reports that same read error from the logs:
> 
> [/dev/sdf1].write_io_errs   0
> [/dev/sdf1].read_io_errs    1
> [/dev/sdf1].flush_io_errs   0
> [/dev/sdf1].corruption_errs 0
> [/dev/sdf1].generation_errs 0

Yeah I'm confused. Maybe the entire dmesg would be useful; or two separate ones:
dmesg | grep -i sdd
dmesg | grep -i sdf

Maybe there's another read error floating around here somewhere…


Chris Murphy


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-02-26  6:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-26  1:27 Detailed RAID Status and Errors Justin Brown
2014-02-26  2:08 ` Chris Murphy
2014-02-26  6:19   ` Justin Brown
2014-02-26  6:42     ` Chris Murphy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.