All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Why are bad disk sectors numbered strangely, and what happens to them?
@ 2003-10-13  9:31 Norman Diamond
       [not found] ` <200310131014.h9DAEwY3000241@81-2-122-30.bradfords.org.uk>
  0 siblings, 1 reply; 64+ messages in thread
From: Norman Diamond @ 2003-10-13  9:31 UTC (permalink / raw)
  To: linux-kernel

Thanks to Andreas Jellinghaus's suggestion, I ran smartctl logs and tests.
My Linux questions increase in number, but first here are the results.

Before testing, the log included a count of 92 errors, of which the
latest 5 had details available.  Reallocated_Sector_Ct was 1 and
Reallocated_Event_Count was 1.  The offline test succeeded and changed
nothing.  The long self-test found one read error.  After testing, the log
still included a count of 92 errors, of which the latest 5 had details
available, and they were the same 5, so the firmware didn't update
that log with the error that was detected by its self-test.    However,
Reallocated_Sector_Ct was 2 and Reallocated_Event_Count was 2.

The self-test saved one detail of its read error separately from the main
log.  LBA_of_first_error was 0x0122403a.  In decimal this was a very
familiar-looking 19021882.

The sector is in a Reiser partition, which might affect some of the
following questions.

So, why do the syslog entries have so many "sector" numbers, which are
mostly different except for some repetitions, and mostly different from
"LBAsect"?  It seems that LBAsect is the correct number of the bad sector.

How can I find out which file contains the bad sector?  I would like to try
to recreate the file from a source of good data.

How can I tell Linux to mark the sector as bad, knowing the LBA sector
number?

Or did the drive's firmware mark the sector as bad during its self-test?  Is
this why the number of reallocations increased from 1 to 2?  But if so, why
didn't this happen when Linux tried to read the sector?

How can I tell Linux to read every sector in the partition?  Oh, I might
know this one,
  dd if=/dev/hda8 of=/dev/null
I want to make sure that the drive is now using a non-defective replacement
sector.


^ permalink raw reply	[flat|nested] 64+ messages in thread
* Re: Why are bad disk sectors numbered strangely, and what happens to them?
@ 2003-10-12  8:25 Norman Diamond
  0 siblings, 0 replies; 64+ messages in thread
From: Norman Diamond @ 2003-10-12  8:25 UTC (permalink / raw)
  To: aj, linux-kernel

Andreas Jellinghaus replied to me with useful advice.  But he didn't really
answer my questions.  Please, if anyone knows the answers to my questions,
please kindly say.
(Why the sectors were numbered so strangely,
what does Linux do with them after detecting them,
and how to know if the errors occured during writes or during reads.)

Anyway,

> try the smartmontools package, it has "smartctl" that will
> show you the discs S.M.A.R.T. details

Good idea, thank you.

> doing a backup couldn't hurt.

It's essentially my crash box at the moment.  But I didn't expect visible
errors on a 2-year-old disk.  (Of course the magnetic layer always has
errors but I didn't expect things to get beyond the firmware's automatic
assignment and writing of replacement sectors.)

And my reason for posting is that the error logs didn't look the way I would
have expected, regarding the sector numbers and the repetitions.

> btw: are you sure cables are ok?

1.  There are none.
2.  If the connector on the motherboard were coming loose from the
motherboard, or if the motherboard had a crack causing intermittent failures
in some of its connections, surely the I/O errors would be far more numerous
and far more random than the strange occurences I observed.


^ permalink raw reply	[flat|nested] 64+ messages in thread
* Why are bad disk sectors numbered strangely, and what happens to them?
@ 2003-10-11  9:00 Norman Diamond
  2003-10-11  9:39 ` Andreas Jellinghaus
  0 siblings, 1 reply; 64+ messages in thread
From: Norman Diamond @ 2003-10-11  9:00 UTC (permalink / raw)
  To: linux-kernel

My first question is why the bad disk sectors are numbered strangely, and
second is what does Linux do with them after detecting them?

I repartitioned and reformatted two Reiser partitions before installing SuSE
8.2 and then compiling kernels 2.6.0-test5, test6, and test7.  My feeling is
that the following errors "should" have been detected during writes, so the
damage "should" not be too bad.  The correct data "should" get written to
replacement sectors.  But my understanding of modern ATA drives is that the
firmware "should" have detected the errors during writes and "should" have
finished the work without the OS knowing about it.

If the following errors occured during reads then I have some pretty angry
questions about why they didn't get detected during writes, especially when
the writes occured minutes or milliseconds prior to the reads.  (I'll copy
this message to some Toshiba employees.  Maybe the next time they visit,
certain persons should get cat food instead of my wife's cooking  _^o^_
MK4018GAP, about 2 years old.)

Hmm, I guess I also need to ask how to figure out if these occured during
writes or reads.

Meanwhile, it seems really strange to see separate numbers for LBAsect and
sector, and to see that the two numbers are sometimes related but sometimes
apparently unrelated, and to see LBAsect remain constant while sector
changes with each error.  What is really going on here?

Also kernel 2.6.0-test7 no longer says whether hda was on 03:08 or 03:00
when the errors were detected.

Sep 27 16:49:41 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 27 16:49:41 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=852296
Sep 27 16:49:41 diamondpana kernel: end_request: I/O error,
   dev 03:08 (hda), sector 852296
Sep 27 16:49:41 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 27 16:49:41 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=852304
Sep 27 16:49:41 diamondpana kernel: end_request: I/O error,
   dev 03:08 (hda), sector 852304
[comment: no more that day]

Sep 28 15:20:20 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 28 15:20:20 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021784
Sep 28 15:20:20 diamondpana kernel: end_request: I/O error,
   dev 03:00 (hda), sector 19021784
Sep 28 15:20:20 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 28 15:20:20 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021786
Sep 28 15:20:20 diamondpana kernel: end_request: I/O error,
   dev 03:00 (hda), sector 19021786
[... every even-numbered sector in this range ...]
Sep 28 15:20:21 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 28 15:20:21 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021880
Sep 28 15:20:21 diamondpana kernel: end_request: I/O error,
   dev 03:00 (hda), sector 19021880
Sep 28 15:20:21 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 28 15:20:21 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021882
Sep 28 15:20:21 diamondpana kernel: end_request: I/O error,
   dev 03:00 (hda), sector 19021882
[comment: after hitting equality, it soon repeated from the middle]
Sep 28 15:20:26 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 28 15:20:26 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021832
Sep 28 15:20:26 diamondpana kernel: end_request: I/O error,
   dev 03:00 (hda), sector 19021832
Sep 28 15:20:26 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 28 15:20:26 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021834
Sep 28 15:20:26 diamondpana kernel: end_request: I/O error,
   dev 03:00 (hda), sector 19021834
[... every even-numbered sector in this range ...]
Sep 28 15:20:26 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 28 15:20:26 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021880
Sep 28 15:20:26 diamondpana kernel: end_request: I/O error,
   dev 03:00 (hda), sector 19021880
Sep 28 15:20:26 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 28 15:20:26 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021882
Sep 28 15:20:26 diamondpana kernel: end_request: I/O error,
   dev 03:00 (hda), sector 19021882
[comment:  after hitting equality again, no more that day]

Sep 29 01:24:09 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 29 01:24:09 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=852296
Sep 29 01:24:09 diamondpana kernel: end_request: I/O error,
   dev 03:08 (hda), sector 852296
Sep 29 01:24:09 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 29 01:24:09 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=852304
Sep 29 01:24:09 diamondpana kernel: end_request: I/O error,
   dev 03:08 (hda), sector 852304
[comment:  same sectors as on Sep 27]

Oct 10 18:29:29 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Oct 10 18:29:29 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021842
Oct 10 18:29:29 diamondpana kernel: end_request: I/O error,
   dev hda, sector 19021842
Oct 10 18:29:29 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Oct 10 18:29:29 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021850
Oct 10 18:29:29 diamondpana kernel: end_request: I/O error,
   dev hda, sector 19021850
[... every 8th sector in this range, congruent to 2 modulo 8 ...]
Oct 10 18:29:30 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Oct 10 18:29:30 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021874
Oct 10 18:29:30 diamondpana kernel: end_request: I/O error,
   dev hda, sector 19021874
Oct 10 18:29:30 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Oct 10 18:29:30 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021882
Oct 10 18:29:30 diamondpana kernel: end_request: I/O error,
   dev hda, sector 19021882
[comment: some of the same sectors as on Sep 28]


^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2003-10-21 20:54 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-10-13  9:31 Why are bad disk sectors numbered strangely, and what happens to them? Norman Diamond
     [not found] ` <200310131014.h9DAEwY3000241@81-2-122-30.bradfords.org.uk>
2003-10-13 10:24   ` Norman Diamond
2003-10-13 10:33     ` John Bradford
2003-10-13 11:30       ` Norman Diamond
2003-10-13 11:58         ` Maciej Zenczykowski
2003-10-15 10:22           ` Norman Diamond
2003-10-13 12:02         ` John Bradford
2003-10-15 10:23           ` Norman Diamond
2003-10-15 18:56             ` Pavel Machek
2003-10-14  6:54         ` Rogier Wolff
2003-10-13 14:24     ` Chuck Campbell
2003-10-13 14:54       ` Maciej Zenczykowski
2003-10-13 16:29         ` Roger Larsson
2003-10-14  6:49     ` Rogier Wolff
2003-10-14  7:05       ` Wes Janzen
2003-10-14  7:21         ` John Bradford
2003-10-14  7:40           ` Rogier Wolff
2003-10-14  8:11             ` John Bradford
2003-10-14  8:45               ` Hans Reiser
2003-10-14  9:46                 ` Rogier Wolff
2003-10-14  9:57                   ` Hans Reiser
2003-10-14 10:10                     ` Rogier Wolff
2003-10-14 10:31                       ` Hans Reiser
2003-10-14 10:19                 ` John Bradford
     [not found]             ` <200310140800.h9E80BT9000815@81-2-122-30.bradfords.org.uk>
     [not found]               ` <20031014081110.GA14418@bitwizard.nl>
2003-10-14  8:55                 ` Wes Janzen
2003-10-14 10:05                   ` Rogier Wolff
2003-10-14  7:24         ` Rogier Wolff
2003-10-14  9:04         ` Hans Reiser
2003-10-15 10:23           ` Norman Diamond
2003-10-15 10:39             ` Hans Reiser
2003-10-17  9:40           ` Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?) Norman Diamond
2003-10-17  9:48             ` Hans Reiser
2003-10-17 11:11               ` Norman Diamond
2003-10-17 11:45                 ` Hans Reiser
2003-10-17 11:51                 ` John Bradford
2003-10-17 12:53                 ` John Bradford
2003-10-17 13:03                   ` Russell King
2003-10-17 13:26                     ` John Bradford
2003-10-19  7:50                   ` Andre Hedrick
2003-10-17 13:04                 ` Russell King
2003-10-17 14:09                   ` Norman Diamond
2003-10-17  9:58             ` Pavel Machek
2003-10-17 10:15               ` Hans Reiser
2003-10-17 10:24             ` Rogier Wolff
2003-10-17 10:49               ` John Bradford
2003-10-17 11:09                 ` Rogier Wolff
2003-10-17 11:24                 ` Krzysztof Halasa
2003-10-17 19:35                   ` John Bradford
2003-10-17 23:28                     ` Krzysztof Halasa
2003-10-18  7:42                       ` Pavel Machek
2003-10-18  8:30                         ` John Bradford
2003-10-21 20:26                           ` bill davidsen
2003-10-18  8:27                       ` John Bradford
2003-10-18 12:02                         ` Krzysztof Halasa
2003-10-18 16:26                           ` Nuno Silva
2003-10-18 20:16                             ` Krzysztof Halasa
     [not found]                     ` <m37k33igui.fsf@defiant. <m3u166vjn0.fsf@defiant.pm.waw.pl>
2003-10-21 20:39                       ` bill davidsen
2003-10-17 10:37             ` ATA Defect management John Bradford
2003-10-21 20:44               ` bill davidsen
2003-10-17 12:08             ` Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?) Justin Cormack
2003-10-21 20:12             ` bill davidsen
  -- strict thread matches above, loose matches on Subject: below --
2003-10-12  8:25 Why are bad disk sectors numbered strangely, and what happens to them? Norman Diamond
2003-10-11  9:00 Norman Diamond
2003-10-11  9:39 ` Andreas Jellinghaus

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.