All of lore.kernel.org
 help / color / mirror / Atom feed
* Unrecovered read error issue
@ 2015-12-19  1:26 Viacheslav Dubeyko
       [not found] ` <1450488372.2652.25.camel-dzAnj6fV1RzTdvqWZYKEhEEK6ufn8VP3@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Viacheslav Dubeyko @ 2015-12-19  1:26 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: Brian Cottingham, linux-nilfs

Hi Ryusuke,

Recently, Brian Cottingham <spiffytech-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> reported about issue
with GC of NILFS2. He shared environment and issue details:

Linux spiffyhome 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u6
(2015-11-09) x86_64 GNU/Linux
nilfs-tools 2.2.1-1

This partition is used for bulk media storage and to hold backups from
my other devices. Pretty low-use; it mostly just sits there waiting
for new data.

The drive is an HDD, purchased 2014-09-04:
http://smile.amazon.com/dp/B00EHBEUZO/ref=pe_385040_121528360_TE_dp_5?sa-no-redirect=1

Model: ATA WDC WD40EZRX-00S (scsi)
Disk /dev/sdb: 4001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  4001GB  4001GB  nilfs2

Disk /dev/sdb: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes

Dec 17 16:02:13 spiffyhome kernel: [175681.852060] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Dec 17 16:02:13 spiffyhome kernel: [175681.852066] ata2.00: BMDMA stat 0x25
Dec 17 16:02:13 spiffyhome kernel: [175681.852070] ata2.00: failed command: READ DMA EXT
Dec 17 16:02:13 spiffyhome kernel: [175681.852077] ata2.00: cmd 25/00:00:40:b0:fc/00:04:5a:00:00/e0 tag 0 dma 524288 in
Dec 17 16:02:13 spiffyhome kernel: [175681.852077]          res 51/40:4f:f0:b2:fc/40:01:5a:00:00/e0 Emask 0x9 (media error)
Dec 17 16:02:13 spiffyhome kernel: [175681.852081] ata2.00: status: { DRDY ERR }
Dec 17 16:02:13 spiffyhome kernel: [175681.852083] ata2.00: error: { UNC }
Dec 17 16:02:14 spiffyhome kernel: [175681.880266] ata2.00: configured for UDMA/133
Dec 17 16:02:14 spiffyhome kernel: [175681.880680] sd 1:0:0:0: [sdb] Unhandled sense code
Dec 17 16:02:14 spiffyhome kernel: [175681.880683] sd 1:0:0:0: [sdb]
Dec 17 16:02:14 spiffyhome kernel: [175681.880685] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 17 16:02:14 spiffyhome kernel: [175681.880688] sd 1:0:0:0: [sdb]
Dec 17 16:02:14 spiffyhome kernel: [175681.880689] Sense Key : Medium Error [current] [descriptor]
Dec 17 16:02:14 spiffyhome kernel: [175681.880692] Descriptor sense data with sense descriptors (in hex):
Dec 17 16:02:14 spiffyhome kernel: [175681.880694]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Dec 17 16:02:14 spiffyhome kernel: [175681.880701]         5a fc b2 f0
Dec 17 16:02:14 spiffyhome kernel: [175681.880705] sd 1:0:0:0: [sdb]
Dec 17 16:02:14 spiffyhome kernel: [175681.880707] Add. Sense: Unrecovered read error - auto reallocate failed
Dec 17 16:02:14 spiffyhome kernel: [175681.880709] sd 1:0:0:0: [sdb] CDB:
Dec 17 16:02:14 spiffyhome kernel: [175681.880711] Read(16): 88 00 00 00 00 00 5a fc b0 40 00 00 04 00 00 00
Dec 17 16:02:14 spiffyhome kernel: [175681.880720] end_request: I/O error, dev sdb, sector 1526510320
Dec 17 16:02:14 spiffyhome kernel: [175681.880756] ata2: EH complete
Dec 17 16:02:14 spiffyhome kernel: [175681.880916] NILFS: GC failed during preparation: cannot read source blocks: err=-5

So, it's possible to see that the reason of issue is unrecoverable read
error on HDD side. But the bad thing here that GC stops on every start
because it encounters I/O error again and again. Finally, aged segments
don't reclaim at all. And, as result, free space of a volume is
exhausted.

From one point of view, GC behavior is correct. GC encounters I/O error
because of external reasons and it stops. But such GC behavior is
completely wrong from end user's point of view. Because bad sector is
not critical issue for stopping GC and file system operations. So, the
ideal solution could be some erasure coding scheme implementation. But
even erasure coding scheme is unable to guarantee complete resolving of
such potential issue. Moreover, opportunity to encounter some error on
drive side is much higher for modern HDD with huge capacity (several
TBs) or modern SSDs. So, it makes sense to implement simple solution for
processing likewise issues on GC side. One of the possible solution
could be to return zeroed block for moving with informing end-user about
such issue in syslog. Another way could be to inform user about such
issue and to provide some user-space tool for recovering volume state.
But again recovering will be simply moving zeroed block.

So, what do you think about such issue? What possible and easy solution
do you see? We haven't opportunity for long-term implementation and we
need in some easy hack for it. What do you think?

Thanks,
Vyacheslav Dubeyko.


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Unrecovered read error issue
       [not found] ` <1450488372.2652.25.camel-dzAnj6fV1RzTdvqWZYKEhEEK6ufn8VP3@public.gmane.org>
@ 2015-12-19  7:17   ` Paul Fertser
       [not found]     ` <20151219071703.GP3706-VZXms658p7SUKArSQO4KrA@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Paul Fertser @ 2015-12-19  7:17 UTC (permalink / raw)
  To: Viacheslav Dubeyko; +Cc: Ryusuke Konishi, Brian Cottingham, linux-nilfs

Hey Viacheslav,

On Fri, Dec 18, 2015 at 05:26:12PM -0800, Viacheslav Dubeyko wrote:
> So, what do you think about such issue? What possible and easy solution
> do you see? We haven't opportunity for long-term implementation and we
> need in some easy hack for it. What do you think?

I reported this issue several years ago (in my case GC was choking on
bad sectors of an eMMC), and no hack was considered to be necessary; I
had to resort to manually dd 4096 zero bytes to the affected block
every time I faced it.

I'm glad to see this is finally getting some attention!

-- 
Be free, use free (http://www.gnu.org/philosophy/free-sw.html) software!
mailto:fercerpav-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Unrecovered read error issue
       [not found]     ` <20151219071703.GP3706-VZXms658p7SUKArSQO4KrA@public.gmane.org>
@ 2015-12-19 21:58       ` Vyacheslav Dubeyko
       [not found]         ` <1450562290.2693.9.camel-RC04EVaD3rlUdzgqiOAiT0EK6ufn8VP3@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Vyacheslav Dubeyko @ 2015-12-19 21:58 UTC (permalink / raw)
  To: Paul Fertser; +Cc: Ryusuke Konishi, Brian Cottingham, linux-nilfs

Hi Paul,

On Sat, 2015-12-19 at 10:17 +0300, Paul Fertser wrote:
> Hey Viacheslav,
> 
> On Fri, Dec 18, 2015 at 05:26:12PM -0800, Viacheslav Dubeyko wrote:
> > So, what do you think about such issue? What possible and easy solution
> > do you see? We haven't opportunity for long-term implementation and we
> > need in some easy hack for it. What do you think?
> 
> I reported this issue several years ago (in my case GC was choking on
> bad sectors of an eMMC), and no hack was considered to be necessary; I
> had to resort to manually dd 4096 zero bytes to the affected block
> every time I faced it.
> 
> I'm glad to see this is finally getting some attention!
> 

I think that I've encounter likewise problem with Seagate hybrid HDD
too.

But I suppose that eMMC or SSD will do re-assignment of bad logical
sector into valid physical page on write operation. And I am not sure
that bad sector will be re-assigned on write operation for HDD case.
File systems had special bad sectors table a decades ago. But this table
is considered as unnecessary metadata structure in modern file systems.

So, it needs to process likewise error as for read operation as for
write operation too, I assume.

Thanks,
Vyacheslav Dubeyko.


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Unrecovered read error issue
       [not found]         ` <1450562290.2693.9.camel-RC04EVaD3rlUdzgqiOAiT0EK6ufn8VP3@public.gmane.org>
@ 2015-12-20 18:34           ` Clemens Eisserer
       [not found]             ` <CAFvQSYTHYypic+bjUOxK=F81XgA1zOvviKkAKQuJ2fD==tn02w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Clemens Eisserer @ 2015-12-20 18:34 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Vyacheslav,


> But I suppose that eMMC or SSD will do re-assignment of bad logical
> sector into valid physical page on write operation. And I am not sure
> that bad sector will be re-assigned on write operation for HDD case.

Modern (SMART-capable) HDDs behave exactly the same - they have spare
sectors and remap bad sectors when writing them. I "fixed" a laptop
HDD once, which had a head-crash at the place where ext4's journal was
located - a simple overwrite with dd fixed the drive.

Best regards, Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Unrecovered read error issue
       [not found]             ` <CAFvQSYTHYypic+bjUOxK=F81XgA1zOvviKkAKQuJ2fD==tn02w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-12-20 21:55               ` Vyacheslav Dubeyko
  0 siblings, 0 replies; 5+ messages in thread
From: Vyacheslav Dubeyko @ 2015-12-20 21:55 UTC (permalink / raw)
  To: Clemens Eisserer; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Clemens,

On Sun, 2015-12-20 at 19:34 +0100, Clemens Eisserer wrote:
> Hi Vyacheslav,
> 
> 
> > But I suppose that eMMC or SSD will do re-assignment of bad logical
> > sector into valid physical page on write operation. And I am not sure
> > that bad sector will be re-assigned on write operation for HDD case.
> 
> Modern (SMART-capable) HDDs behave exactly the same - they have spare
> sectors and remap bad sectors when writing them. I "fixed" a laptop
> HDD once, which had a head-crash at the place where ext4's journal was
> located - a simple overwrite with dd fixed the drive.
> 

Yes, such remapping technology should exists for HDD too. But HDD
opportunity for remapping of bad sectors is more limited because you
cannot reserve significant amount of spare sectors. Available free space
is really important feature. Maybe it is possible to compare a size of
SSD's overprovisioning and HDD's reserved spare sectors space. But SSD's
FTL is able to use any NAND erase block for remapping. And I suppose
that HDD's reserved spare sectors space is simply special contiguous
area of sectors. So, I think that probability to exhaust HDD's reserved
spare sectors is more higher for HDD case. It means that probability to
encounter write error because of impossibility to remap bad sector is
more higher for HDD case, from my point of view.

Another bad thing with using dd for cleaning some bad sectors is
impossibility to distinguish what you try to rewrite. Because you try to
rewrite as user data as metadata. Zeroing some space in ext4 journal is
not very critical action. You will simply lose one or several
transaction in the journal. It means that the file system will be in
consistent state anyway. You could lose some user data. But if you try
to rewrite by means of dd some arbitrary sector into file system volume
then consequence of such action is hard to forecast. So, the best way is
to track and to process the occurrence of unrecoverable read error on
file system side.

Thanks,
Vyacheslav Dubeyko.


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-12-20 21:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-19  1:26 Unrecovered read error issue Viacheslav Dubeyko
     [not found] ` <1450488372.2652.25.camel-dzAnj6fV1RzTdvqWZYKEhEEK6ufn8VP3@public.gmane.org>
2015-12-19  7:17   ` Paul Fertser
     [not found]     ` <20151219071703.GP3706-VZXms658p7SUKArSQO4KrA@public.gmane.org>
2015-12-19 21:58       ` Vyacheslav Dubeyko
     [not found]         ` <1450562290.2693.9.camel-RC04EVaD3rlUdzgqiOAiT0EK6ufn8VP3@public.gmane.org>
2015-12-20 18:34           ` Clemens Eisserer
     [not found]             ` <CAFvQSYTHYypic+bjUOxK=F81XgA1zOvviKkAKQuJ2fD==tn02w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-12-20 21:55               ` Vyacheslav Dubeyko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.