All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Michael Tokarev <mjt@tls.msk.ru>
Cc: linux-raid@vger.kernel.org
Subject: Re: raid1 repair does not repair errors?
Date: Tue, 22 Oct 2013 12:11:13 +1100	[thread overview]
Message-ID: <20131022121113.48958a0b@notabene.brown> (raw)
In-Reply-To: <526541CD.8000003@msgid.tls.msk.ru>

[-- Attachment #1: Type: text/plain, Size: 4424 bytes --]

On Mon, 21 Oct 2013 19:01:33 +0400 Michael Tokarev <mjt@tls.msk.ru> wrote:

> Hello.
> 
> I've a raid1 array (composed of 4 drives, so it is a 4-fold
> copy of data), and one of the drives has an unreadable (bad)
> sector in the partition belonging to this array.
> 
> When I run md 'repair' action, it hits the error place, the
> kernel clearly returns an error, but md does not do anything
> with it.  For example:
> 
> Oct 21 18:43:55 mother kernel: [190018.073098] md: requested-resync of RAID array md1
> Oct 21 18:43:55 mother kernel: [190018.093910] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> Oct 21 18:43:55 mother kernel: [190018.114765] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for requested-resync.
> Oct 21 18:43:55 mother kernel: [190018.136459] md: using 128k window, over a total of 2096064k.
> Oct 21 18:45:11 mother kernel: [190094.091974] ata6.00: exception Emask 0x0 SAct 0xf SErr 0x0 action 0x0
> Oct 21 18:45:11 mother kernel: [190094.114093] ata6.00: irq_stat 0x40000008
> Oct 21 18:45:11 mother kernel: [190094.135906] ata6.00: failed command: READ FPDMA QUEUED
> Oct 21 18:45:11 mother kernel: [190094.157710] ata6.00: cmd 60/00:00:00:3b:3e/04:00:00:00:00/40 tag 0 ncq 524288 in
> Oct 21 18:45:11 mother kernel: [190094.157710]          res 41/40:00:29:3e:3e/00:00:00:00:00/40 Emask 0x409 (media error) <F>
> Oct 21 18:45:11 mother kernel: [190094.202315] ata6.00: status: { DRDY ERR }
> Oct 21 18:45:11 mother kernel: [190094.224517] ata6.00: error: { UNC }
> Oct 21 18:45:11 mother kernel: [190094.248920] ata6.00: configured for UDMA/133
> Oct 21 18:45:11 mother kernel: [190094.271003] sd 5:0:0:0: [sdc] Unhandled sense code
> Oct 21 18:45:11 mother kernel: [190094.293044] sd 5:0:0:0: [sdc]
> Oct 21 18:45:11 mother kernel: [190094.314654] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Oct 21 18:45:11 mother kernel: [190094.336483] sd 5:0:0:0: [sdc]
> Oct 21 18:45:11 mother kernel: [190094.357966] Sense Key : Medium Error [current] [descriptor]
> Oct 21 18:45:11 mother kernel: [190094.379808] Descriptor sense data with sense descriptors (in hex):
> Oct 21 18:45:11 mother kernel: [190094.402024]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
> Oct 21 18:45:11 mother kernel: [190094.424502]         00 3e 3e 29
> Oct 21 18:45:11 mother kernel: [190094.446338] sd 5:0:0:0: [sdc]
> Oct 21 18:45:11 mother kernel: [190094.467995] Add. Sense: Unrecovered read error - auto reallocate failed
> Oct 21 18:45:11 mother kernel: [190094.490075] sd 5:0:0:0: [sdc] CDB:
> Oct 21 18:45:11 mother kernel: [190094.511870] Read(10): 28 00 00 3e 3b 00 00 04 00 00
> Oct 21 18:45:11 mother kernel: [190094.533829] end_request: I/O error, dev sdc, sector 4079145
> Oct 21 18:45:11 mother kernel: [190094.555800] ata6: EH complete
> Oct 21 18:45:22 mother kernel: [190105.602687] md: md1: requested-resync done.
> 
> There's no indication that raid code tried to re-write the bad spot,
> and the bad block remains bad in the drive, so next read (direct from
> the drive) return the same I/O error with the same kernel messages.
> 
> Shouldn't `repair' action re-write the problem place?

Yes it should.
When end_sync_read() notices that BIO_UPTODATE isn't set it refuses to set
R1BIO_Uptodate.
When sync_request_write() notices that isn't set it calls
fix_sync_read_error().

fix_sync_read_error then calls sync_page_io() for each page in the region and
if that fails (as you would expect, it goes on to the next disk and the next
until a working one is found.  Then that block is written back to all those
that failed.
fix_sync_read_error doesn't report any success, but as it re-read the failing
device you should see the SCSI read error reported a second time at least.

Are you able to add some tracing and recompile the kernel and see if you can
find out what is happening?
e.g.
  if end_sync_read doesn't see BIO_UPTODATE, print something.
  if sync_request_write doesn't see R1BIO_Uptodate, print something
  when fix_sync_read_error calls sync_page_io, print something.

??

Thanks,
NeilBrown



> 
> This is kernel 3.10.15.
> 
> Thank you!
> 
> /mjt
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2013-10-22  1:11 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-21 15:01 raid1 repair does not repair errors? Michael Tokarev
2013-10-22  1:11 ` NeilBrown [this message]
2013-10-24  8:58   ` Michael Tokarev
2014-02-02 12:24 Michael Tokarev
2014-02-02 21:51 ` Peter Grandi
2014-02-03  1:04 ` NeilBrown
2014-02-03  4:36   ` NeilBrown
2014-02-03  7:30     ` Michael Tokarev
2014-02-03 17:46       ` Michael Tokarev
2014-02-04  4:30         ` NeilBrown
2014-02-04 19:34           ` Michael Tokarev
2014-02-04 22:51             ` NeilBrown
2014-02-06 14:21   ` Mikael Abrahamsson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131022121113.48958a0b@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=mjt@tls.msk.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.