RE: IDE DMA errors, massive disk corruption: Why? Fixed Yet? W hy not re-do failed op?

All of lore.kernel.org
 help / color / mirror / Atom feed

* RE: IDE DMA errors, massive disk corruption:  Why?  Fixed Yet?  W hy not  re-do failed op?
@ 2003-10-06 19:32 Mudama, Eric
  2003-10-06 20:20 ` IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why " Daniel B.
  2003-10-10  1:10 ` IDE DMA errors, massive disk corruption: Why? Fixed Yet? W hy " Greg Stark
  0 siblings, 2 replies; 12+ messages in thread
From: Mudama, Eric @ 2003-10-06 19:32 UTC (permalink / raw)
  To: 'Daniel B.', linux-kernel

> -----Original Message-----
> From: Daniel B. [mailto:dsb@smart.net]
> Sent: Monday, October 06, 2003 12:42 PM
> To: linux-kernel@vger.kernel.org
> Subject: IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why
> not re-do failed op?
> 
> Doesn't the kernel keep track of uncompleted operations,
> retain the information needed to try again, and try again
> if there's a failure?  If not, why not?

If the disk has write cache enabled, this isn't necessarilly possible, since
there's nothing in the IDE specification that guarantees the order of writes
to the media without a FLUSH CACHE (EXT) command.

Hypothetically, if you were doing full-pack random writes continuously with
no idle time and no FLUSH CACHE, you can have writes that are days old still
in the drive's buffer and still un-attempted.  A write with write-cache
enabled reports ending status at the completion of the transfer.  There is
no mechanism to tell the host that a cached write failed, other than giving
an error on the next command.

Obviously, drive companies have techniques to prevent this (data staying in
buffer for too long) from happening, but they are all vendor specific and
not part of the specification.

The flip side of this, running your drive with write cache off, is rather
destructive to performance in a modern IDE drive... anywhere from 33% as
fast to .1% as fast, depending on the workload.

> If it can't try again, shouldn't the kernel at least abort after one 
> disk-write failure instead of performing additional writes, which
> frequently depend on the previous writes?  (E.g., if I try to read 
> block 1's data and write it to block 2, and then write something new 
> to block 1, if the first write fails but continue and do the second
> write, data gets destroyed.  If the first write fails and I 
> stop right 
> away, less is destroyed.)

If a modern IDE disk gets a fatal write, it is toast.  The lengths drives go
through attempting to reassign to a new location are rather heroic IMO.

Any drive that gets a "real" fatal write (0x71 status for example) as
opposed to a timeout needs to be RMA'd back to the vendor.  Some drives will
work in a read-only mode if they get power cycled, but it isn't always
guaranteed.  If you can get your data off, do so immediately, and replace
the drive.

--eric

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: IDE DMA errors, massive disk corruption:  Why?  Fixed Yet?  Why not   re-do failed op?
  2003-10-06 19:32 IDE DMA errors, massive disk corruption: Why? Fixed Yet? W hy not re-do failed op? Mudama, Eric
@ 2003-10-06 20:20 ` Daniel B.
  2003-10-06 20:45   ` Valdis.Kletnieks
  2003-10-10  1:10 ` IDE DMA errors, massive disk corruption: Why? Fixed Yet? W hy " Greg Stark
  1 sibling, 1 reply; 12+ messages in thread
From: Daniel B. @ 2003-10-06 20:20 UTC (permalink / raw)
  Cc: linux-kernel

"Mudama, Eric" wrote:
... 
> > Doesn't the kernel keep track of uncompleted operations,
> > retain the information needed to try again, and try again
> > if there's a failure?  If not, why not?
> 
> If the disk has write cache enabled, this isn't necessarilly possible, since
> there's nothing in the IDE specification that guarantees the order of writes
> to the media without a FLUSH CACHE (EXT) command.

Are you sure?  If you issue a write to block 1 and then issue another
write to block 1, it would have to guarantee the relative order of those 
writes (or equivalent optimization in the write cache), wouldn't it?

> Hypothetically, if you were doing full-pack random writes continuously with
> no idle time and no FLUSH CACHE, you can have writes that are days old still
> in the drive's buffer and still un-attempted.  A write with write-cache
> enabled reports ending status at the completion of the transfer.  There is
> no mechanism to tell the host that a cached write failed, other than giving
> an error on the next command.

But we're not talking about errors IN the disk drive after the communi-
cation between the kernel and drive is already done.  We're talking
about errors in the communication BETWEEN the kernel and the drive (lost
DMA interrupts), aren't we?

If the kernel issues a write command to the drive, and never gets a 
response (DMA-complete interrupt?) from the drive that it has accepted 
the command, why can't the kernel repeat the write command?

Daniel
-- 
Daniel Barclay
dsb@smart.net

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why not re-do failed op?
  2003-10-06 20:20 ` IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why " Daniel B.
@ 2003-10-06 20:45   ` Valdis.Kletnieks
  2003-10-06 21:07     ` Daniel B.
  0 siblings, 1 reply; 12+ messages in thread
From: Valdis.Kletnieks @ 2003-10-06 20:45 UTC (permalink / raw)
  To: Daniel B.; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1041 bytes --]

On Mon, 06 Oct 2003 16:20:42 EDT, "Daniel B." said:

> Are you sure?  If you issue a write to block 1 and then issue another
> write to block 1, it would have to guarantee the relative order of those 
> writes (or equivalent optimization in the write cache), wouldn't it?

If the old 'block 1' data is still in the write cache, then another write
should overlay it - that's a very basic optimization.  Consider the case of a
very active block that has a popular inode that's being atime-updated a lot (or
whatever causes a lot of activity - ignore the in-memory cache and sync/fsync
for the moment). You really don't want 34 writes to the same block taking up 34
blocks of space in the write cache....

The ordering issue comes when the following type of thing happens:

1) a write for block 993 is issued (metadata, perhaps)
2) a write for block 10934 is issued - actual file contents or something that
depends on 993 being written.
3) Disk writes 10934 out.
4) Things go bad  (power hit, whatever) before 993 gets written out.
5) fsck. ;)

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why not  re-do failed op?
  2003-10-06 20:45   ` Valdis.Kletnieks
@ 2003-10-06 21:07     ` Daniel B.
  2003-10-06 21:26       ` Jeff Garzik
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel B. @ 2003-10-06 21:07 UTC (permalink / raw)
  Cc: linux-kernel

Valdis.Kletnieks@vt.edu wrote:
> 
> ...
> 
> The ordering issue comes when the following type of thing happens:
> 
> 1) a write for block 993 is issued (metadata, perhaps)
> 2) a write for block 10934 is issued - actual file contents or something that
> depends on 993 being written.
> 3) Disk writes 10934 out.
> 4) Things go bad  (power hit, whatever) before 993 gets written out.
> 5) fsck. ;)

It that scenario relevant to DMA errors?  

I'm talking about problems in steps 1 and 2, not in later steps.

If the kernel starts a write command for block 993, wouldn't it wait
for a DMA interrupt signalling that the drive has received and accepted
the command before the kernel starts the write command for block 10934?

If it timed out waiting for that interrupt, can't it re-issue the
write for block 993 before proceeding?

Daniel
-- 
Daniel Barclay
dsb@smart.net

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why not  re-do failed op?
  2003-10-06 21:07     ` Daniel B.
@ 2003-10-06 21:26       ` Jeff Garzik
  2003-10-07  5:24         ` IDE DMA errors, massive disk corruption: Why? Fixed Yet? Whynot " Daniel B.
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Garzik @ 2003-10-06 21:26 UTC (permalink / raw)
  To: Daniel B.; +Cc: linux-kernel

Daniel B. wrote:
> If the kernel starts a write command for block 993, wouldn't it wait
> for a DMA interrupt signalling that the drive has received and accepted
> the command before the kernel starts the write command for block 10934?

With command queueing, no, it would not wait.


> If it timed out waiting for that interrupt, can't it re-issue the
> write for block 993 before proceeding?

Assuming a large amount of sanity in your OS driver... certainly.

	Jeff




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: IDE DMA errors, massive disk corruption: Why? Fixed Yet? Whynot   re-do failed op?
  2003-10-06 21:26       ` Jeff Garzik
@ 2003-10-07  5:24         ` Daniel B.
  2003-10-07  6:03           ` Valdis.Kletnieks
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel B. @ 2003-10-07  5:24 UTC (permalink / raw)
  Cc: linux-kernel

Jeff Garzik wrote:
> 
> Daniel B. wrote:
> > If the kernel starts a write command for block 993, wouldn't it wait
> > for a DMA interrupt signalling that the drive has received and accepted
> > the command before the kernel starts the write command for block 10934?
> 
> With command queueing, no, it would not wait.

Other than the write-back caching, it's not an open-loop system, 
right?  Regardless of how commands are batched or queued, isn't there 
some acknowledgment back from the drive that some batch of commands
(or some command, or some part of some command) was completed?

Surely the kernel checks for such acknowledgments, right? 

DMA-complete interrupts are probably how some of those acknowledgments 
are communicated, right?

So if the kernel doesn't get an expected DMA interrupt, it should
know that some command(/batch/part) wasn't acknowledged successfully,
right?  And surely it can tell _which_ command/batch/part wasn't
acknowledged (if multiple ones can be outstanding), right?

So if some command/batch/etc. wasn't acknowledged, why can't the 
kernel retry the command/batch/etc.?

> > If it timed out waiting for that interrupt, can't it re-issue the
> > write for block 993 before proceeding?
> 
> Assuming a large amount of sanity in your OS driver... certainly.

Given the serious of disk data corruption, why isn't the Linux kernel
more reliable here?  Hasn't this family of IDE problems been around
for a couple of years now?

Daniel
-- 
Daniel Barclay
dsb@smart.net

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: IDE DMA errors, massive disk corruption: Why? Fixed Yet? Whynot re-do failed op?
  2003-10-07  5:24         ` IDE DMA errors, massive disk corruption: Why? Fixed Yet? Whynot " Daniel B.
@ 2003-10-07  6:03           ` Valdis.Kletnieks
  2003-10-07 12:23             ` Ruth Ivimey-Cook
  2003-10-07 13:32             ` IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why not re-do " Daniel B.
  0 siblings, 2 replies; 12+ messages in thread
From: Valdis.Kletnieks @ 2003-10-07  6:03 UTC (permalink / raw)
  To: Daniel B.; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1697 bytes --]

On Tue, 07 Oct 2003 01:24:19 EDT, "Daniel B." said:

> So if some command/batch/etc. wasn't acknowledged, why can't the 
> kernel retry the command/batch/etc.?

The problem is that the disk ack'ed the command when the block went into the
write cache.  You *DONT* in general get back another ack when the block
actually hits the platters.

> Given the serious of disk data corruption, why isn't the Linux kernel
> more reliable here?  Hasn't this family of IDE problems been around
> for a couple of years now?

It's hard for the kernel to be more reliable unless you just disable the write cache.

The biggest reason we don't see more issues like this is that the average MTBF
really is up in the 100K hours and up range, and most drives probably get
around to actually writing all the blocks out every minute or so - so you're
looking at literally a 1 in a million shot at corruption.  Most of the time,
it's writing back in-order enough that no badness happens - and with the rise
of journaled file systems like ext3 and jfs and resierfs, the chance of
actually getting bit by it drops even more (you'd have to hit a case where the
blocks were re-ordered *and* the corresponding journal blocks didn't get
written either).

Yes, this family of problems has been around ever since write caches were
introduced. It's just taken until now that we've got file system code that's
rock solid enough that the write cache is a major reliability issue - for the
longest time, one kernel bug or another has been more of a concern.
See the IDE corruption in early 2.5 kernels that scared a LOT of people
away - I believe that one was done all by the kernel, without any help
from the disk's write cache. ;)

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: IDE DMA errors, massive disk corruption: Why? Fixed Yet? Whynot re-do failed op?
  2003-10-07  6:03           ` Valdis.Kletnieks
@ 2003-10-07 12:23             ` Ruth Ivimey-Cook
  2003-10-07 13:46               ` IDE DMA errors, massive disk corruption: Why? Fixed Yet? Whynotre-do " Daniel B.
  2003-10-07 13:32             ` IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why not re-do " Daniel B.
  1 sibling, 1 reply; 12+ messages in thread
From: Ruth Ivimey-Cook @ 2003-10-07 12:23 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: Daniel B., linux-kernel

On Tue, 7 Oct 2003, Valdis.Kletnieks@vt.edu wrote:
>On Tue, 07 Oct 2003 01:24:19 EDT, "Daniel B." said:
>> So if some command/batch/etc. wasn't acknowledged, why can't the 
>> kernel retry the command/batch/etc.?
>The problem is that the disk ack'ed the command when the block went into the
>write cache.  You *DONT* in general get back another ack when the block
>actually hits the platters.

But surely what Daniel is complaining about is that the disk never did ack the 
bus transfer.

Consider this as a correct sequence of operations (hope I get it right:-) :


1.   Kernel uses IDE controller to initiate ATA disk write request:
     a. Kernel sets up DMA parameters (start, length, timeout)
     b. kernel initiates transfer of 1 sector to disk
     c. (in parallel with b) drive accepts transfer request and waits for data

2.   IDE controller DMA used to transfer data to disk unit:
     a. hardware DMA sends 256 16-bit words of data to disk
     b. (in parallel) drive accepts (acks) each word of data as it comes 
        over and writes it into internal buffer (be it a write cache or
        just a staging area).

3.   Transfer complete actions: when the required number of words are acked:
     a. IDE DMA controller fires end-of-transfer IRQ
     b. (in parallel) if write cache enabled, disk makes sector available to
        be written to disk (e.g. by linking the buffer into the write cache) 
        or, if write cache is disabled, initiates transfer to platter.

4.   Kernel sees end of transfer IRQ and initiates software ACK of transfer, 
     e.g. to remove DMA buffer from 'block dirty' list.

5.   If caching enabled, some time later the data in the drive is written to 
     the platter.


Now, the case I believe Daniel is complaining about is that things go well
through step 1 and perhaps some part of step 2. But, because the drive doesn't
accept the data or some other error, step 3 doesn't happen. Consequently, the
IDE DMA timeout happens, the kernel cries foul and things go wrong. So the
failure actually looks like this:


1.   Kernel uses IDE controller to initiate ATA disk write request:
     a. Kernel sets up DMA parameters (start, length, timeout)
     b. kernel initiates transfer of 1 sector to disk
     c. (in parallel with b) drive accepts transfer request and waits for data

2.   IDE controller DMA used to transfer data to disk unit:
     a. hardware DMA tries to send 256 16-bit words of data to disk
     b. (in parallel) drive accepts none or, perhaps, some data from bus into 
        internal buffer, but not all of it.

3.   After waiting, IDE controller fires DMA timeout IRQ.

4.   Kernel sees IRQ and emits warning message. Tries to reset bus and ....



Have I got this scenario right?

Ruth

-- 
Ruth Ivimey-Cook
Software engineer and technical writer.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why not  re-do failed op?
  2003-10-07  6:03           ` Valdis.Kletnieks
  2003-10-07 12:23             ` Ruth Ivimey-Cook
@ 2003-10-07 13:32             ` Daniel B.
  1 sibling, 0 replies; 12+ messages in thread
From: Daniel B. @ 2003-10-07 13:32 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: linux-kernel

Valdis.Kletnieks@vt.edu wrote:
> 
> On Tue, 07 Oct 2003 01:24:19 EDT, "Daniel B." said:
> 
> > So if some command/batch/etc. wasn't acknowledged, why can't the
> > kernel retry the command/batch/etc.?
> 
> The problem is that the disk ack'ed the command when the block went into the
> write cache.  

That's the acknowledgment I'm talking about.

> You *DONT* in general get back another ack when the block
> actually hits the platters.

I know.  I wasn't talking about any acknowledge after actually writing
the data to the medium.

> > Given the serious of disk data corruption, why isn't the Linux kernel
> > more reliable here?  Hasn't this family of IDE problems been around
> > for a couple of years now?
> 
> It's hard for the kernel to be more reliable unless you just disable the write cache.

Again, I'm NOT talking about write-cache problems.  I'm talking about
problems in the communication/handshaking between the kernel and
the drive.

> The biggest reason we don't see more issues like this is that the average MTBF
> really is up in the 100K hours and up range

That reliability figure is for the _drives_.

That figure obviously does not apply to kernel-to-drive communication,
because I've had dozens of DMA-interrupt corruptions in the last two
or so years.

> Yes, this family of problems has been around ever since write caches were
> introduced. 

I'm not talking about problems related to write caches.  I'm talking 
about DMA interrupt problems.  Why do you think I'm talking about
inside-the-black-box write-cache problems?

> It's just taken until now that we've got file system code that's
> rock solid enough 

Rock solid?  Hah!  If file system (and other disk-related) code is so 
solid why did my root partition get screwed so badly it can't boot?  

(Even if it's bad hardware's fault that an interrupt got lost, and 
even if it's unreasonably complicated (or impossible) for the
kernel to retry an unacknowledged command, why didn't the kernel
stop writing to that disk after the first unacknowledged command?)

> that the write cache is a major reliability issue - for the
> longest time, one kernel bug or another has been more of a concern.

It's not "has been"--it is still a problem, in the newest (is .22 
still the newest) released stable kernel.  

Daniel
-- 
Daniel Barclay
dsb@smart.net

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: IDE DMA errors, massive disk corruption: Why? Fixed Yet? Whynotre-do  failed op?
  2003-10-07 12:23             ` Ruth Ivimey-Cook
@ 2003-10-07 13:46               ` Daniel B.
  0 siblings, 0 replies; 12+ messages in thread
From: Daniel B. @ 2003-10-07 13:46 UTC (permalink / raw)
  Cc: linux-kernel

Ruth Ivimey-Cook wrote:
> 
> On Tue, 7 Oct 2003, Valdis.Kletnieks@vt.edu wrote:
> ...
> But surely what Daniel is complaining about is that the disk never did ack the
> bus transfer.

Yes, much closer to what I meant.  

Actually, I'd talking about when the kernel doesn't receive the 
acknowledgment (the DMA  interrupt).


> 
> Consider this as a correct sequence of operations (hope I get it right:-) :
> 
> 1.   Kernel uses IDE controller to initiate ATA disk write request:
...
> 2.   IDE controller DMA used to transfer data to disk unit:
...
> 3.   Transfer complete actions: when the required number of words are acked:
>      a. IDE DMA controller fires end-of-transfer IRQ
>      b. ...
> 
> 4.   Kernel sees end of transfer IRQ and initiates software ACK of transfer,
>      e.g. to remove DMA buffer from 'block dirty' list.
> 
> 5.   If caching enabled, some time later the data in the drive is written to
>      the platter.
> 
> Now, the case I believe Daniel is complaining about is that things go well
> through step 1 and perhaps some part of step 2. But, because the drive doesn't
> accept the data or some other error, step 3 doesn't happen. 

Actually, I think I'm talking about the very beginning of step 4--
the interrupt request doesn't actually make it to the kernel ("interrupt
lost"?), so the kernel doesn't see the interrupt request.

(I've been assuming that the errors I'm getting are just from DMA 
interrupt problems, that is, the drive accepted the data just fine, but 
the kernel didn't see the acknowledgement.  Of course, it's not clear
how that in itself would cause corruption, so I don't know for sure
that drive-side errors or rejections aren't involved.)


> Consequently, the
> IDE DMA timeout happens, the kernel cries foul and things go wrong. So the
> failure actually looks like this:
> 
> 1.   Kernel uses IDE controller to initiate ATA disk write request:
>      a. Kernel sets up DMA parameters (start, length, timeout)
>      b. kernel initiates transfer of 1 sector to disk
>      c. (in parallel with b) drive accepts transfer request and waits for data
> 
> 2.   IDE controller DMA used to transfer data to disk unit:
>      a. hardware DMA tries to send 256 16-bit words of data to disk
>      b. (in parallel) drive accepts none or, perhaps, some data from bus into
>         internal buffer, but not all of it.
> 
> 3.   After waiting, IDE controller fires DMA timeout IRQ.
> 
> 4.   Kernel sees IRQ and emits warning message. Tries to reset bus and ....

Actually, I'm thinking of the case where the interrupt request doesn't
make it to the kernel, so the kernel _doesn't_ see any IRQ in the expected 
time (and proceeds as you say).


> Have I got this scenario right?

Just about.

(Also, thanks for the DMA details.)



Daniel
-- 
Daniel Barclay
dsb@smart.net

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: IDE DMA errors, massive disk corruption:  Why?  Fixed Yet?  W hy not  re-do failed op?
  2003-10-06 19:32 IDE DMA errors, massive disk corruption: Why? Fixed Yet? W hy not re-do failed op? Mudama, Eric
  2003-10-06 20:20 ` IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why " Daniel B.
@ 2003-10-10  1:10 ` Greg Stark
  1 sibling, 0 replies; 12+ messages in thread
From: Greg Stark @ 2003-10-10  1:10 UTC (permalink / raw)
  To: Mudama, Eric; +Cc: 'Daniel B.', linux-kernel

"Mudama, Eric" <eric_mudama@Maxtor.com> writes:

> If the disk has write cache enabled, this isn't necessarilly possible, since
> there's nothing in the IDE specification that guarantees the order of writes
> to the media without a FLUSH CACHE (EXT) command.

So, uhm, is there an interface exporting this command to applications?
Databases like Postgres would love to be able to issue such a command.

As it stands they have to do some awful hacks with fsync and sync. Postgres in
particular at certain points just calls sync and then waits an arbitrary time
hoping that that should be enough to get everything to disk.

Some users have in fact resorted to disabling the cache on their ide drives.
And of course it absolutely demolishes performance. Having it be disabled just
at the few points in time when it actually matters would be a huge improvement.

-- 
greg

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: IDE DMA errors, massive disk corruption:  Why?  Fixed Yet?  W hy not   re-do failed op?
@ 2003-10-06 20:46 Mudama, Eric
  0 siblings, 0 replies; 12+ messages in thread
From: Mudama, Eric @ 2003-10-06 20:46 UTC (permalink / raw)
  To: 'Daniel B.'; +Cc: linux-kernel

> -----Original Message-----
> From: Daniel B. [mailto:dsb@smart.net]
> Sent: Monday, October 06, 2003 2:21 PM
> Cc: linux-kernel@vger.kernel.org
> Subject: Re: IDE DMA errors, massive disk corruption: Why? Fixed Yet?
> Why not re-do failed op?
> 
> Are you sure?  If you issue a write to block 1 and then issue another
> write to block 1, it would have to guarantee the relative 
> order of those 
> writes (or equivalent optimization in the write cache), wouldn't it?

Relative order of two writes to the same LBA is guaranteed, however the bus
order of two distinct writes is not required to be the same as the disk-work
order of those same two writes.

Picture the states as:
X (initial)
A (write 1 to LBA n)
B (write 2 to LBA n)

There are two posibilities that are both "legal":

1. drive maintains separate buffer space for both writes, and does them in
order

2. drive shares buffer space for both writes, and the 2nd write "corrupts"
the first one. There are three different things that can occur in this
situation of simultaneous disk and cable IO:

2a) Drive completes first write before 2nd bus transfer occurs, this results
in two distinct correct states on the media

	X -> A -> B

2b) Drive is in the middle of the first write when 2nd bus transfer occurs,
this results in a write splice which the drive must detect and then rewrite
the data in the buffer which is "correct":

	X -> A'B' -> B

2c) Drive hasn't started the write when the 2nd bus transfer occurs, so only
a single physical write actually needs to occur.  The drive actually
transfers from 

	X -> B

In all 3 cases, you should end up in state B.  (All this is in the absense
of reads, FYI).  Case 2 is *much* faster for local-area IO... Case 1
guarantees at least 1 rev of rotational latency per operation on
overlapped/repetitive writes in the steady-state, whereas Case 2 requires
more internal brains but can accept writes at bus speed regardless of
overlaps.  Case 1 is also less efficient for cache space, since you could
concievably use the entire 8MB drive cache to hold 16K copies of the same
LBA.

In either case, an error of *any* kind on a write means that the entire
region you were writing should be considered invalid, and you should
re-write the entire transfer.

> But we're not talking about errors IN the disk drive after 
> the communi-
> cation between the kernel and drive is already done.  We're talking
> about errors in the communication BETWEEN the kernel and the 
> drive (lost
> DMA interrupts), aren't we?
> 
> If the kernel issues a write command to the drive, and never gets a 
> response (DMA-complete interrupt?) from the drive that it has 
> accepted 
> the command, why can't the kernel repeat the write command?

In that case (which I guess is the whole issue) the kernel should repeat the
write command.  If the DMA never completes for some reason, the entire DMA
transfer should be considered invalid and re-done.  Reading a drive after a
partial data transfer has unspecified results. (Though a lot of OEMs test
for this sort of thing to figure out how each vendor's implementation
varies)

--eric

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2003-10-10  1:10 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-10-06 19:32 IDE DMA errors, massive disk corruption: Why? Fixed Yet? W hy not re-do failed op? Mudama, Eric
2003-10-06 20:20 ` IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why " Daniel B.
2003-10-06 20:45   ` Valdis.Kletnieks
2003-10-06 21:07     ` Daniel B.
2003-10-06 21:26       ` Jeff Garzik
2003-10-07  5:24         ` IDE DMA errors, massive disk corruption: Why? Fixed Yet? Whynot " Daniel B.
2003-10-07  6:03           ` Valdis.Kletnieks
2003-10-07 12:23             ` Ruth Ivimey-Cook
2003-10-07 13:46               ` IDE DMA errors, massive disk corruption: Why? Fixed Yet? Whynotre-do " Daniel B.
2003-10-07 13:32             ` IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why not re-do " Daniel B.
2003-10-10  1:10 ` IDE DMA errors, massive disk corruption: Why? Fixed Yet? W hy " Greg Stark
2003-10-06 20:46 Mudama, Eric

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.