All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pavel Machek <pavel@ucw.cz>
To: "Theodore Ts'o" <tytso@mit.edu>,
	kernel list <linux-kernel@vger.kernel.org>,
	adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org
Subject: Re: ext4: media error but where?
Date: Mon, 7 Jul 2014 20:55:43 +0200	[thread overview]
Message-ID: <20140707185543.GA26056@amd.pavel.ucw.cz> (raw)
In-Reply-To: <20140707010002.GD471@thunk.org>

On Sun 2014-07-06 21:00:02, Theodore Ts'o wrote:
> On Sun, Jul 06, 2014 at 11:37:11PM +0200, Pavel Machek wrote:
> > 
> > Well, when I got report about hw problems, badblocks -c was my first
> > instinct. On the usb hdd, the most errors were due to 3.16-rc1 kernel
> > bug, not real problems.
> 
> The problem is with modern disk drives, this is a *wrong* instinct.
> That's my point.  In general, trying to mess with the bad blocks list
> in the ext2/3/4 file system is just not the right thing to do with
> modern disk drives.  That's because with modern disk drives, the hard
> drives will do bad block remapping.

Actually... I believe it was the right instinct. 

If I wanted to recover the data... remount-r would be the way to
go. Then back it up using dd_rescue. ... But that way I'd turn bad
sectors into silent data corruption.

If I wanted to recover data from that partition, fsck -c (or
badblocks, but that's trickier) and then dd_rescue would be the way to go.

> Basically, with modern disks, if the HDD has a hard ECC error, it will
> return an error --- but if you write to the sector, it will either
> rewrite onto that location on the platter, or if that part of the
> platter is truly gone, it will remap to the bad block spare pool.  So
> telling the disk to never use that block again isn't going to be the
> right answer.

Actually -- tool to do relocations would be nice. It is not exactly
easy to do it right by hand.

I know the theory. I had 5 read-error incidents this year.

#1: Seagate refuses to reallocate sectors. Not sure why, I tried
 pretty much everything.

#2: 3.16-rc1 produces incorrect errors every 4GB, leading to "bad
sectors" that disappear with other kernels

#3: Some more bad sectors appear on the Seagate

#4: Kernel on thinkpad reports errors in daily check. Which is strange
 because there's nothing in SMART.

#5: Some old IDE hdd has bad sectors in unused or unimportant areas. 

In #5 the theory might match the reality (I did not check, I trashed
the disks).

> The badblocks approach to dealing with hardware problems made sense
> back when we had IDE disks.  But that's been over a decade ago.  These
> days, it's horribly obsolete.

Forcing reallocation is hard & tricky. You may want to simply mark it
bad and lose a tiny bit of disk space... And even if you want to force
reallocation, you want to do fsck -c, first, and restore affected
files from backup.

									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

  reply	other threads:[~2014-07-07 18:55 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-26 20:20 ext4: total breakdown on USB hdd, 3.0 kernel Pavel Machek
2014-06-26 20:30 ` Pavel Machek
2014-06-26 20:50   ` Pavel Machek
2014-06-27  2:48     ` Theodore Ts'o
2014-06-27  2:46   ` Theodore Ts'o
2014-06-29 20:25     ` Pavel Machek
2014-06-29 21:04       ` Theodore Ts'o
2014-06-30  6:46         ` Pavel Machek
2014-06-30 13:43           ` Theodore Ts'o
2014-07-04 10:23             ` ext4: media error but where? Pavel Machek
2014-07-04 12:11               ` Theodore Ts'o
2014-07-04 17:21                 ` Pavel Machek
2014-07-04 18:06                   ` Pavel Machek
2014-07-04 18:56                   ` Theodore Ts'o
2014-07-06 13:32                     ` Pavel Machek
2014-07-06 13:43                       ` Pavel Machek
2014-07-06 18:29                         ` Theodore Ts'o
2014-07-06 21:37                           ` Pavel Machek
2014-07-07  1:00                             ` Theodore Ts'o
2014-07-07 18:55                               ` Pavel Machek [this message]
2014-07-07 23:18                                 ` 3.16-rc, ext4: oopses, OOMs after hard powerdown Pavel Machek
2014-07-07 23:21                                 ` ext4: media error but where? Theodore Ts'o
2014-07-04 19:17                   ` Andreas Dilger
2014-07-04 20:33                     ` Pavel Machek
2014-07-04 22:18                       ` Andreas Dilger
2014-07-05 22:17                       ` Theodore Ts'o
2014-06-27  8:23 ` ext4: total breakdown on USB hdd, 3.0 kernel Oliver Neukum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140707185543.GA26056@amd.pavel.ucw.cz \
    --to=pavel@ucw.cz \
    --cc=adilger.kernel@dilger.ca \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.