All of lore.kernel.org
 help / color / mirror / Atom feed
* Please add more info to dmesg output on I/O error
@ 2017-03-01 16:04 Timofey Titovets
  2017-03-01 19:38 ` Kai Krakow
  2017-03-02  0:35 ` Chris Murphy
  0 siblings, 2 replies; 5+ messages in thread
From: Timofey Titovets @ 2017-03-01 16:04 UTC (permalink / raw)
  To: linux-btrfs

Hi, today i try move my FS from old HDD to new SSD
While processing i catch I/O error and device remove operation was canceled

Dmesg:
[ 1015.010241] blk_update_request: I/O error, dev sda, sector 81353664
[ 1015.010246] BTRFS error (device sdb1): bdev /dev/sda1 errs: wr 0,
rd 23, flush 0, corrupt 0, gen 0
[ 1015.010282] ata5: EH complete
[ 1017.016721] ata5.00: exception Emask 0x0 SAct 0x10000 SErr 0x0 action 0x0
[ 1017.016730] ata5.00: irq_stat 0x40000008
[ 1017.016737] ata5.00: failed command: READ FPDMA QUEUED
[ 1017.016748] ata5.00: cmd 60/08:80:c0:5b:d9/00:00:04:00:00/40 tag 16
ncq dma 4096 in
                       res 41/40:00:c0:5b:d9/00:00:04:00:00/40 Emask
0x409 (media error) <F>
[ 1017.016754] ata5.00: status: { DRDY ERR }
[ 1017.016757] ata5.00: error: { UNC }
[ 1017.029479] ata5.00: configured for UDMA/133
[ 1017.029506] sd 4:0:0:0: [sda] tag#16 UNKNOWN(0x2003) Result:
hostbyte=0x00 driverbyte=0x08
[ 1017.029511] sd 4:0:0:0: [sda] tag#16 Sense Key : 0x3 [current]
[ 1017.029516] sd 4:0:0:0: [sda] tag#16 ASC=0x11 ASCQ=0x4
[ 1017.029520] sd 4:0:0:0: [sda] tag#16 CDB: opcode=0x28 28 00 04 d9
5b c0 00 00 08 00

At now, i fixed this problem by doing scrub FS and delete damaged
files, but scrub are slow, and if btrfs show me a more info on I/O
error, it's will be more helpful
i.e. something like i getting by scrub:
[ 1260.559180] BTRFS warning (device sdb1): i/o error at logical
40569896960 on dev /dev/sda1, sector 81351616, root 309, inode 55135,
offset 71278592, length 4096, links 1 (path:
nefelim4ag/.config/skypeforlinux/Cache/data_3)

Thanks.
-- 
Have a nice day,
Timofey.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Please add more info to dmesg output on I/O error
  2017-03-01 16:04 Please add more info to dmesg output on I/O error Timofey Titovets
@ 2017-03-01 19:38 ` Kai Krakow
  2017-03-02  0:40   ` Chris Murphy
  2017-03-02  0:35 ` Chris Murphy
  1 sibling, 1 reply; 5+ messages in thread
From: Kai Krakow @ 2017-03-01 19:38 UTC (permalink / raw)
  To: linux-btrfs

Am Wed, 1 Mar 2017 19:04:26 +0300
schrieb Timofey Titovets <nefelim4ag@gmail.com>:

> Hi, today i try move my FS from old HDD to new SSD
> While processing i catch I/O error and device remove operation was
> canceled
> 
> Dmesg:
> [ 1015.010241] blk_update_request: I/O error, dev sda, sector 81353664
> [ 1015.010246] BTRFS error (device sdb1): bdev /dev/sda1 errs: wr 0,
> rd 23, flush 0, corrupt 0, gen 0
> [ 1015.010282] ata5: EH complete
> [ 1017.016721] ata5.00: exception Emask 0x0 SAct 0x10000 SErr 0x0
> action 0x0 [ 1017.016730] ata5.00: irq_stat 0x40000008
> [ 1017.016737] ata5.00: failed command: READ FPDMA QUEUED
> [ 1017.016748] ata5.00: cmd 60/08:80:c0:5b:d9/00:00:04:00:00/40 tag 16
> ncq dma 4096 in
>                        res 41/40:00:c0:5b:d9/00:00:04:00:00/40 Emask
> 0x409 (media error) <F>
> [ 1017.016754] ata5.00: status: { DRDY ERR }
> [ 1017.016757] ata5.00: error: { UNC }
> [ 1017.029479] ata5.00: configured for UDMA/133
> [ 1017.029506] sd 4:0:0:0: [sda] tag#16 UNKNOWN(0x2003) Result:
> hostbyte=0x00 driverbyte=0x08
> [ 1017.029511] sd 4:0:0:0: [sda] tag#16 Sense Key : 0x3 [current]
> [ 1017.029516] sd 4:0:0:0: [sda] tag#16 ASC=0x11 ASCQ=0x4
> [ 1017.029520] sd 4:0:0:0: [sda] tag#16 CDB: opcode=0x28 28 00 04 d9
> 5b c0 00 00 08 00
> 
> At now, i fixed this problem by doing scrub FS and delete damaged
> files, but scrub are slow, and if btrfs show me a more info on I/O
> error, it's will be more helpful
> i.e. something like i getting by scrub:
> [ 1260.559180] BTRFS warning (device sdb1): i/o error at logical
> 40569896960 on dev /dev/sda1, sector 81351616, root 309, inode 55135,
> offset 71278592, length 4096, links 1 (path:
> nefelim4ag/.config/skypeforlinux/Cache/data_3)
> 
> Thanks.

You should turn off SCT ERC with smartctl or set it to lower values, or
if that doesn't work with your HDD firmware, increase the timeout of
the scsi driver above 120s. This setup as it is, is not going to work
correctly with btrfs in case of errors.

# smartctl -l scterc,70,70 /dev/sdb

should do the trick if supported. It applies an error correction
timeout of 7 seconds for reading and writing, which is below the kernel
scsi layer timeout of 30 seconds. Otherwise, your drive will fail to
respond for two minutes until the kernel resets the drive. According to
dmesg, this is what happened.

NAS-ready drives usually support this setting, while desktop drives
don't or at least default to standard desktop timeouts.

-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Please add more info to dmesg output on I/O error
  2017-03-01 16:04 Please add more info to dmesg output on I/O error Timofey Titovets
  2017-03-01 19:38 ` Kai Krakow
@ 2017-03-02  0:35 ` Chris Murphy
  1 sibling, 0 replies; 5+ messages in thread
From: Chris Murphy @ 2017-03-02  0:35 UTC (permalink / raw)
  To: Timofey Titovets; +Cc: linux-btrfs

On Wed, Mar 1, 2017 at 9:04 AM, Timofey Titovets <nefelim4ag@gmail.com> wrote:
> Hi, today i try move my FS from old HDD to new SSD
> While processing i catch I/O error and device remove operation was canceled
>
> Dmesg:
> [ 1015.010241] blk_update_request: I/O error, dev sda, sector 81353664
> [ 1015.010246] BTRFS error (device sdb1): bdev /dev/sda1 errs: wr 0,
> rd 23, flush 0, corrupt 0, gen 0
> [ 1015.010282] ata5: EH complete
> [ 1017.016721] ata5.00: exception Emask 0x0 SAct 0x10000 SErr 0x0 action 0x0
> [ 1017.016730] ata5.00: irq_stat 0x40000008
> [ 1017.016737] ata5.00: failed command: READ FPDMA QUEUED
> [ 1017.016748] ata5.00: cmd 60/08:80:c0:5b:d9/00:00:04:00:00/40 tag 16
> ncq dma 4096 in
>                        res 41/40:00:c0:5b:d9/00:00:04:00:00/40 Emask
> 0x409 (media error) <F>
> [ 1017.016754] ata5.00: status: { DRDY ERR }
> [ 1017.016757] ata5.00: error: { UNC }
> [ 1017.029479] ata5.00: configured for UDMA/133
> [ 1017.029506] sd 4:0:0:0: [sda] tag#16 UNKNOWN(0x2003) Result:
> hostbyte=0x00 driverbyte=0x08
> [ 1017.029511] sd 4:0:0:0: [sda] tag#16 Sense Key : 0x3 [current]
> [ 1017.029516] sd 4:0:0:0: [sda] tag#16 ASC=0x11 ASCQ=0x4
> [ 1017.029520] sd 4:0:0:0: [sda] tag#16 CDB: opcode=0x28 28 00 04 d9
> 5b c0 00 00 08 00

This is an error reported by the drive to libata. It's not a Btrfs
error or bug. The UNC suggests it's an uncorrectable error, so whether
Btrfs can compensate depends on whether there's redundancy for the
affected sector(s).



> At now, i fixed this problem by doing scrub FS and delete damaged
> files, but scrub are slow, and if btrfs show me a more info on I/O
> error, it's will be more helpful
> i.e. something like i getting by scrub:
> [ 1260.559180] BTRFS warning (device sdb1): i/o error at logical
> 40569896960 on dev /dev/sda1, sector 81351616, root 309, inode 55135,
> offset 71278592, length 4096, links 1 (path:
> nefelim4ag/.config/skypeforlinux/Cache/data_3)

That suggests the problem is with data, not metadata. What is the data
and metadata profile?



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Please add more info to dmesg output on I/O error
  2017-03-01 19:38 ` Kai Krakow
@ 2017-03-02  0:40   ` Chris Murphy
  2017-03-02  2:30     ` Timofey Titovets
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Murphy @ 2017-03-02  0:40 UTC (permalink / raw)
  To: Kai Krakow; +Cc: Btrfs BTRFS

On Wed, Mar 1, 2017 at 12:38 PM, Kai Krakow <hurikhan77@gmail.com> wrote:
> Am Wed, 1 Mar 2017 19:04:26 +0300
> schrieb Timofey Titovets <nefelim4ag@gmail.com>:
>
>> Hi, today i try move my FS from old HDD to new SSD
>> While processing i catch I/O error and device remove operation was
>> canceled
>>
>> Dmesg:
>> [ 1015.010241] blk_update_request: I/O error, dev sda, sector 81353664
>> [ 1015.010246] BTRFS error (device sdb1): bdev /dev/sda1 errs: wr 0,
>> rd 23, flush 0, corrupt 0, gen 0
>> [ 1015.010282] ata5: EH complete
>> [ 1017.016721] ata5.00: exception Emask 0x0 SAct 0x10000 SErr 0x0
>> action 0x0 [ 1017.016730] ata5.00: irq_stat 0x40000008
>> [ 1017.016737] ata5.00: failed command: READ FPDMA QUEUED
>> [ 1017.016748] ata5.00: cmd 60/08:80:c0:5b:d9/00:00:04:00:00/40 tag 16
>> ncq dma 4096 in
>>                        res 41/40:00:c0:5b:d9/00:00:04:00:00/40 Emask
>> 0x409 (media error) <F>
>> [ 1017.016754] ata5.00: status: { DRDY ERR }
>> [ 1017.016757] ata5.00: error: { UNC }
>> [ 1017.029479] ata5.00: configured for UDMA/133
>> [ 1017.029506] sd 4:0:0:0: [sda] tag#16 UNKNOWN(0x2003) Result:
>> hostbyte=0x00 driverbyte=0x08
>> [ 1017.029511] sd 4:0:0:0: [sda] tag#16 Sense Key : 0x3 [current]
>> [ 1017.029516] sd 4:0:0:0: [sda] tag#16 ASC=0x11 ASCQ=0x4
>> [ 1017.029520] sd 4:0:0:0: [sda] tag#16 CDB: opcode=0x28 28 00 04 d9
>> 5b c0 00 00 08 00
>>
>> At now, i fixed this problem by doing scrub FS and delete damaged
>> files, but scrub are slow, and if btrfs show me a more info on I/O
>> error, it's will be more helpful
>> i.e. something like i getting by scrub:
>> [ 1260.559180] BTRFS warning (device sdb1): i/o error at logical
>> 40569896960 on dev /dev/sda1, sector 81351616, root 309, inode 55135,
>> offset 71278592, length 4096, links 1 (path:
>> nefelim4ag/.config/skypeforlinux/Cache/data_3)
>>
>> Thanks.
>
> You should turn off SCT ERC with smartctl or set it to lower values,

Unlikely. The OP suggests single HDD to single SSD. Only if there is
redundancy is it appropriate to set SCT ERC to a low value like 70
deciseconds.

If it's a single drive, the thing to do is disable SCT ERC in the case
it's enabled (?) which might be what's going on, so that there's a
longer recovery time and maybe the drive figures out the problem and
recovers the data.

smartctl -l scterc /dev/sdX

That should report back the SCT ERC status. Don't change it until we
know the configuration.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Please add more info to dmesg output on I/O error
  2017-03-02  0:40   ` Chris Murphy
@ 2017-03-02  2:30     ` Timofey Titovets
  0 siblings, 0 replies; 5+ messages in thread
From: Timofey Titovets @ 2017-03-02  2:30 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Kai Krakow, Btrfs BTRFS

2017-03-02 3:40 GMT+03:00 Chris Murphy <lists@colorremedies.com>:
> On Wed, Mar 1, 2017 at 12:38 PM, Kai Krakow <hurikhan77@gmail.com> wrote:
>> Am Wed, 1 Mar 2017 19:04:26 +0300
>> schrieb Timofey Titovets <nefelim4ag@gmail.com>:
>>
>>> Hi, today i try move my FS from old HDD to new SSD
>>> While processing i catch I/O error and device remove operation was
>>> canceled
>>>
>>> Dmesg:
>>> [ 1015.010241] blk_update_request: I/O error, dev sda, sector 81353664
>>> [ 1015.010246] BTRFS error (device sdb1): bdev /dev/sda1 errs: wr 0,
>>> rd 23, flush 0, corrupt 0, gen 0
>>> [ 1015.010282] ata5: EH complete
>>> [ 1017.016721] ata5.00: exception Emask 0x0 SAct 0x10000 SErr 0x0
>>> action 0x0 [ 1017.016730] ata5.00: irq_stat 0x40000008
>>> [ 1017.016737] ata5.00: failed command: READ FPDMA QUEUED
>>> [ 1017.016748] ata5.00: cmd 60/08:80:c0:5b:d9/00:00:04:00:00/40 tag 16
>>> ncq dma 4096 in
>>>                        res 41/40:00:c0:5b:d9/00:00:04:00:00/40 Emask
>>> 0x409 (media error) <F>
>>> [ 1017.016754] ata5.00: status: { DRDY ERR }
>>> [ 1017.016757] ata5.00: error: { UNC }
>>> [ 1017.029479] ata5.00: configured for UDMA/133
>>> [ 1017.029506] sd 4:0:0:0: [sda] tag#16 UNKNOWN(0x2003) Result:
>>> hostbyte=0x00 driverbyte=0x08
>>> [ 1017.029511] sd 4:0:0:0: [sda] tag#16 Sense Key : 0x3 [current]
>>> [ 1017.029516] sd 4:0:0:0: [sda] tag#16 ASC=0x11 ASCQ=0x4
>>> [ 1017.029520] sd 4:0:0:0: [sda] tag#16 CDB: opcode=0x28 28 00 04 d9
>>> 5b c0 00 00 08 00
>>>
>>> At now, i fixed this problem by doing scrub FS and delete damaged
>>> files, but scrub are slow, and if btrfs show me a more info on I/O
>>> error, it's will be more helpful
>>> i.e. something like i getting by scrub:
>>> [ 1260.559180] BTRFS warning (device sdb1): i/o error at logical
>>> 40569896960 on dev /dev/sda1, sector 81351616, root 309, inode 55135,
>>> offset 71278592, length 4096, links 1 (path:
>>> nefelim4ag/.config/skypeforlinux/Cache/data_3)
>>>
>>> Thanks.
>>
>> You should turn off SCT ERC with smartctl or set it to lower values,
>
> Unlikely. The OP suggests single HDD to single SSD. Only if there is
> redundancy is it appropriate to set SCT ERC to a low value like 70
> deciseconds.
>
> If it's a single drive, the thing to do is disable SCT ERC in the case
> it's enabled (?) which might be what's going on, so that there's a
> longer recovery time and maybe the drive figures out the problem and
> recovers the data.
>
> smartctl -l scterc /dev/sdX
>
> That should report back the SCT ERC status. Don't change it until we
> know the configuration.
>
>
> --
> Chris Murphy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


JFYI:
Data: single
Metadata: dup
It's just a notebook with 2.5 hdd

Guys, thanks, but i don't need a help with solving problem, i already
solved the problem by finding and remove damaged data from FS.
I only say that:
  first message generated after:
  # btrfs device remove /dev/sdXn <path>
  BTRFS error (device sdb1): bdev /dev/sda1 errs: wr 0, rd 23, flush
0, corrupt 0, gen 0
  It's only notify me that i have a problem with read, but did not say
me "where"

  second message generated after:
  # btrfs scrub start /dev/sdXn
  it's more useful:
  BTRFS warning (device sdb1): i/o error at logical 40569896960 on dev
/dev/sda1, sector 81351616, root 309, inode 55135, offset 71278592,
length 4096, links 1 (path:
nefelim4ag/.config/skypeforlinux/Cache/data_3)

i already understand after first message that i have a bad sectors on
HDD and smart also say me that.
i just want replace a drive, and i understood btrfs behaviour, btrfs
just try keep data save and abort device delete operation on I/O
error.
But btrfs, your are smart enough, please give me more info on error,
what stored on corrupted sector?

  if btrfs show this message early (i.e. after increasing error
counter), then it could save my time (~1,5h while i trying understand
what happen and doing full FS Scrub)

Thanks.

-- 
Have a nice day,
Timofey.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-03-02  4:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-01 16:04 Please add more info to dmesg output on I/O error Timofey Titovets
2017-03-01 19:38 ` Kai Krakow
2017-03-02  0:40   ` Chris Murphy
2017-03-02  2:30     ` Timofey Titovets
2017-03-02  0:35 ` Chris Murphy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.