linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ondrej Zary <linux@rainbow-software.org>
To: Pavel Machek <pavel@ucw.cz>
Cc: Mark Lord <mlord@pobox.com>,
	Marcus Overhagen <marcus.overhagen@gmail.com>,
	kernel list <linux-kernel@vger.kernel.org>,
	linux-ide@vger.kernel.org, tj@kernel.org
Subject: Re: SATA hdd refuses to reallocate a sector?
Date: Mon, 24 Jun 2013 09:14:29 +0200	[thread overview]
Message-ID: <201306240914.29502.linux@rainbow-software.org> (raw)
In-Reply-To: <20130623215100.GA7414@amd.pavel.ucw.cz>

On Sunday 23 June 2013, Pavel Machek wrote:
> On Sun 2013-06-23 17:27:52, Mark Lord wrote:
> > On 13-06-23 03:00 PM, Pavel Machek wrote:
> > > Thanks for the hint. (Insert rant about hdparm documentation
> > > explaining that it is bad idea, but not telling me _why_ is it bad
> > > idea. Can I expect cache consistency issues after that, or is it just
> > > simple "you are writing to the disk without any checks"? Plus, I guess
> > > documentation should mention what sector number is. I guess sectors
> > > are 512bytes for the old drives, but is it 512 or 4096 for new
> > > drives?)
> >
> > For ATA, use the "logical sector size".
> > For all existing drives out there, that's a 512 byte unit.
>
> I guessed so. (It would be good to actually document it, as well as
> documenting exactly why it is dangerous. Is it okay to send patches?)
>
> > > ...but it does not do the trick :-(. It behaves strangely as if it was
> > > still cached somewhere. Do I need to turn off the write back cache?
> >
> > No, it works just fine.  You probably have more than one bad sector.
> > After you see a read failure, run "smartctl -a" and look at the error
> > logs to see what sector the drive is choking on.
>
> Well, I definitely have more than one bad sector, but I did try to
> read exactly the same sector and it failed. See below.
>
> root@amd:~# hdparm --yes-i-know-what-i-am-doing  --read-sector
>  961237188 /dev/sda | uniq
>
> /dev/sda:
> FAILED: Input/output error
> reading sector 961237188:
> root@amd:~# hdparm --yes-i-know-what-i-am-doing  --write-sector
> 961237188 /dev/sda
>
> /dev/sda:
> re-writing sector 961237188: succeeded
> root@amd:~# hdparm --yes-i-know-what-i-am-doing  --read-sector
> 961237188 /dev/sda | uniq
>
> /dev/sda:
> FAILED: Input/output error
> reading sector 961237188:
> root@amd:~# hdparm --yes-i-know-what-i-am-doing  --write-sector
> 961237188 /dev/sda
>
> /dev/sda:
> re-writing sector 961237188: succeeded
> root@amd:~# hdparm --yes-i-know-what-i-am-doing  --read-sector
> 961237188 /dev/sda | uniq
>
> /dev/sda:
> reading sector 961237188: succeeded
> 0000 0000 0000 0000 0000 0000 0000 0000
> root@amd:~# dd if=/dev/sda4 of=/dev/zero bs=4096
> skip=$[8958947328/4096]
> dd: reading `/dev/sda4': Input/output error
> 102+0 records in
> 102+0 records out
> 417792 bytes (418 kB) copied, 6.12536 s, 68.2 kB/s
> root@amd:~# hdparm --yes-i-know-what-i-am-doing  --read-sector
> 961237188 /dev/sda | uniq
>
> /dev/sda:
> reading sector 961237188: succeeded
> 0000 0000 0000 0000 0000 0000 0000 0000
> root@amd:~# hdparm --yes-i-know-what-i-am-doing  --read-sector
> 961237188 /dev/sda | uniq
>
> /dev/sda:
> reading sector 961237188: succeeded
> 0000 0000 0000 0000 0000 0000 0000 0000
> root@amd:~# hdparm --yes-i-know-what-i-am-doing  --read-sector
> 961237188 /dev/sda | uniq
>
> /dev/sda:
> FAILED: Input/output error
> reading sector 961237188:
> root@amd:~#
>
> > Or just low-level format it all with "hdparm --security-erase".
>
> I'd like to understand what is going on there. I can mark the blocks
> as bad at ext3 level, but I'd really like to understand what is going
> on there, and if it is hw issue, sata issue or block layer issue.
>
> (Plus, given that remapping does not work, I'd be afraid that it will
> kill the disk for good).
>
> The disk is
>
> root@amd:~# smartctl -a /dev/sda
> smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
> Copyright (C) 2002-10 by Bruce Allen,
> http://smartmontools.sourceforge.net
>
> === START OF INFORMATION SECTION ===
> Model Family:     Seagate Momentus 5400.6 series
> Device Model:     ST9500325AS
> Serial Number:    5VE41HDA
> Firmware Version: 0001SDM1
> User Capacity:    500,107,862,016 bytes
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   8
> ATA Standard is:  ATA-8-ACS revision 4
> Local Time is:    Sun Jun 23 23:49:15 2013 CEST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> Thanks for support,
> 									Pavel

Being tired of using hdparm manually, I created a simple hdd_realloc utility
that reads the disk in big blocks (1 MB). When there's a read error, it reads
the failed block sector-by-sector and tries to rewrite the sectors that fail
to read. It work fine for disks with just a couple of pending sectors.

#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <time.h>
#include <unistd.h>

#define BLOCK_SIZE	1048576
#define SECTOR_SIZE	512

int main(int argc, char *argv[]) {
	if (argc < 2) {
		fprintf(stderr, "Usage: %s <device> [pos]\n", argv[0]);
		return 1;
	}
	int dev = open(argv[1], O_RDWR | O_DIRECT | O_SYNC);
	if (dev < 1) {
		perror("Unable to open device");
		return 2;
	}

	posix_fadvise(dev, 0, 0, POSIX_FADV_RANDOM);

	off64_t startpos = 0, pos = 0;
	if (argc > 2) {
		sscanf(argv[2], "%lld", &startpos);
	}
	pos = startpos;
	char *buf = valloc(BLOCK_SIZE);
	char *zeros = valloc(SECTOR_SIZE);
	if (!buf || !zeros) {
		fprintf(stderr, "Memory allocation error\n");
		return 2;
	}
	memset(zeros, 0, SECTOR_SIZE);

	time_t starttime = time(NULL);

	while (1) {
		printf("\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b");
		printf("Position: %lld B (%lld MiB, %lld GiB, sector %lld), rate %lld MiB/s", pos, pos / 1024 / 1024,
			pos / 1024 / 1024 / 1024, pos / SECTOR_SIZE,
			(pos - startpos) / 1024 / 1024 / ((time(NULL) - starttime) ? (time(NULL) - starttime) : 1) );
		lseek64(dev, pos, SEEK_SET);
		int count = read(dev, buf, BLOCK_SIZE);
		if (count == 0)	{/* EOF */
			printf("End of disk\n");
			break;
		}
		if (count < 0) { /* read error */
			printf("\n");
			perror("Read error");
			printf("Examining %lld\n", pos);
			for (int i = 0; i < BLOCK_SIZE/SECTOR_SIZE; i++) {
				lseek64(dev, pos, SEEK_SET);
				if (read(dev, buf, SECTOR_SIZE) < SECTOR_SIZE) {
					printf("Unable to read at %lld, rewriting...", pos);
					lseek64(dev, pos, SEEK_SET);
					int result = write(dev, zeros, SECTOR_SIZE);
					if (result < 0) {
						printf("write error\n");
					} else {
						lseek64(dev, pos, SEEK_SET);
						if (read(dev, buf, SECTOR_SIZE) < SECTOR_SIZE)
							printf("read error after rewrite\n");
						else
							printf("OK\n");
					}
				}
				pos += SECTOR_SIZE;
			}
		} else /* no error */
			pos += count;
	}

	return 0;
}


-- 
Ondrej Zary

  parent reply	other threads:[~2013-06-24  7:15 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-23 10:19 SATA hdd refuses to reallocate a sector? Pavel Machek
2013-06-23 11:21 ` Pavel Machek
2013-06-23 13:16   ` Marcus Overhagen
2013-06-23 19:00     ` Pavel Machek
2013-06-23 21:27       ` Mark Lord
2013-06-23 21:51         ` Pavel Machek
2013-06-23 22:35           ` Mark Lord
2013-06-24  6:19             ` Marcus Overhagen
2013-06-24 12:28               ` Pavel Machek
2013-06-24  7:14           ` Ondrej Zary [this message]
2013-06-24 11:06             ` Pavel Machek
2013-06-24 12:18             ` Mark Lord
2013-06-26  3:04               ` James Bottomley
2013-06-26  6:11                 ` Ondrej Zary
2013-06-29 18:47             ` Henrique de Moraes Holschuh
2013-06-29 23:02               ` Mark Lord
2013-06-30 14:34                 ` Henrique de Moraes Holschuh
2013-06-30 16:49                   ` Pavel Machek
2013-07-01 13:28                     ` Henrique de Moraes Holschuh
2015-04-30 19:01         ` Pavel Machek
2013-06-24 12:48 ` Zdenek Kaspar
2013-06-24 13:08   ` Ondrej Zary

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201306240914.29502.linux@rainbow-software.org \
    --to=linux@rainbow-software.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marcus.overhagen@gmail.com \
    --cc=mlord@pobox.com \
    --cc=pavel@ucw.cz \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).