linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Question] Does the kernel ignore errors writng to disk?
@ 2005-04-27 18:40 mike.miller
  2005-04-27 19:12 ` Richard B. Johnson
  2005-04-28 14:58 ` Alan Cox
  0 siblings, 2 replies; 15+ messages in thread
From: mike.miller @ 2005-04-27 18:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-scsi, brace

Hello All,
I have observed some behavior under certain failure conditions that seems as if the kernel may be ignoring write errors to disk. 
During very heavy read/write io if we force a disk to fail requests continue to be submitted until the controllers queue is full. Ultimately, the requests are timed out by the controller. When this happens we see filesystem corruption. Sometimes it's the file data, other times it's filesystem metadata that has been timed out and failed. Either way its obviously undesirable behavior.
It looks like the OS/filesystem (ext2/3 and reiserfs) does not wait for for a successful completion. Is this assumption correct?

Thanks,
mikem

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Question] Does the kernel ignore errors writng to disk?
  2005-04-27 18:40 [Question] Does the kernel ignore errors writng to disk? mike.miller
@ 2005-04-27 19:12 ` Richard B. Johnson
  2005-04-28 14:58 ` Alan Cox
  1 sibling, 0 replies; 15+ messages in thread
From: Richard B. Johnson @ 2005-04-27 19:12 UTC (permalink / raw)
  To: mike.miller; +Cc: linux-kernel, linux-scsi, brace

On Wed, 27 Apr 2005 mike.miller@hp.com wrote:

> Hello All,
> I have observed some behavior under certain failure conditions that seems
> as if the kernel may be ignoring write errors to disk.
> During very heavy read/write io if we force a disk to fail requests
> continue to be submitted until the controllers queue is full.
> Ultimately, the requests are timed out by the controller. When this
> happens we see filesystem corruption. Sometimes it's the file data,
> other times it's filesystem metadata that has been timed out and
> failed. Either way its obviously undesirable behavior.
> It looks like the OS/filesystem (ext2/3 and reiserfs) does not
> wait for for a successful completion. Is this assumption correct?
>
> Thanks,
> mikem

It depends. Obviously if you disconnect your hard drive, the writes
will fail with a time-out. But they fail after a number of retries
(it depends upon the type of disk and its driver). So, if you
"force" a timeout by disconnecting a drive, you don't have
the same situtation as a normally failed write.

Disk/file writes go like this (assuming no sync() or fsync()).

(1)  File data gets flushed to a queue.
(2)  When the queue gets nearly full, based upon a LRU mechanism,
      data are written to the disk.
(3)  If the disk-write fails, the driver retries the write.
(4)  If the write continues to fail, i.e., timeout, no disk, etc.
      the kernel gives up and does not hang forever. If you have
      disconnected the drive, you won't have any syslog writes to
      the device so your next boot won't show the event. It looks
      as though it was ignored.

You can observe the behavior by mounting a floppy disk and
then removing it while it is being written. There are many
attempts to write to the device and then that write is discarded.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.11 on an i686 machine (5537.79 BogoMips).
  Notice : All mail here is now cached for review by Dictator Bush.
                  98.36% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Question] Does the kernel ignore errors writng to disk?
  2005-04-27 18:40 [Question] Does the kernel ignore errors writng to disk? mike.miller
  2005-04-27 19:12 ` Richard B. Johnson
@ 2005-04-28 14:58 ` Alan Cox
  2005-04-28 18:14   ` Bryan Henderson
  2005-04-28 23:22   ` Bartlomiej Zolnierkiewicz
  1 sibling, 2 replies; 15+ messages in thread
From: Alan Cox @ 2005-04-28 14:58 UTC (permalink / raw)
  To: mike.miller; +Cc: Linux Kernel Mailing List, linux-scsi, brace

On Mer, 2005-04-27 at 19:40, mike.miller@hp.com wrote:
> It looks like the OS/filesystem (ext2/3 and reiserfs) does not wait for for a successful completion. Is this assumption correct?

Of course it doesn't. At 250 ops/second for a decent disk no OS waits
for completions, all batch and asynchronously queue I/O. See man fsync
and also O_DIRECT if you need specific "to disk" support. If you do that
be aware that you must also turn write caching off on the IDE disk. I've
repeatedly asked the "maintainer" of the IDE layer to do this
automatically but gave up bothering long ago. Without that setting users
are playing with fire quite honestly.

The alternative with latest 2.6 stuff is to turn on Jens Axboe's barrier
work which seems to give better performance on a drive new enough to
have cache flush operations.

Alan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Question] Does the kernel ignore errors writng to disk?
  2005-04-28 14:58 ` Alan Cox
@ 2005-04-28 18:14   ` Bryan Henderson
  2005-04-28 22:43     ` Alan Cox
  2005-04-28 23:22   ` Bartlomiej Zolnierkiewicz
  1 sibling, 1 reply; 15+ messages in thread
From: Bryan Henderson @ 2005-04-28 18:14 UTC (permalink / raw)
  To: Alan Cox; +Cc: brace, Linux Kernel Mailing List, linux-scsi, mike.miller

>See man fsync
>and also O_DIRECT if you need specific "to disk" support

Probably the most common way to get the simple but slow write function 
where the write() call actually writes to stable storage, and fails if it 
can't, is the O_SYNC open flag.

But even that, in some versions of Linux, can miss write errors.  It's not 
easy for Linux to catch them because the code that sees the I/O fail 
doesn't know if it's part of some synchronous procedure where the user 
will eventually find out about the error or the more common case where the 
application has optimistically walked away and nothing can be done but 
write off the loss.

--
Bryan Henderson                          IBM Almaden Research Center
San Jose CA                              Filesystems

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Question] Does the kernel ignore errors writng to disk?
  2005-04-28 18:14   ` Bryan Henderson
@ 2005-04-28 22:43     ` Alan Cox
  2005-04-28 23:14       ` Bryan Henderson
  0 siblings, 1 reply; 15+ messages in thread
From: Alan Cox @ 2005-04-28 22:43 UTC (permalink / raw)
  To: Bryan Henderson; +Cc: brace, Linux Kernel Mailing List, linux-scsi, mike.miller

On Iau, 2005-04-28 at 19:14, Bryan Henderson wrote:
> Probably the most common way to get the simple but slow write function 
> where the write() call actually writes to stable storage, and fails if it 
> can't, is the O_SYNC open flag.

O_SYNC doesn't work completely on several file systems and only on the
latest kernels with some of the common ones.

> But even that, in some versions of Linux, can miss write errors.  It's not 
> easy for Linux to catch them because the code that sees the I/O fail 
> doesn't know if it's part of some synchronous procedure where the user 
> will eventually find out about the error or the more common case where the 
> application has optimistically walked away and nothing can be done but 
> write off the loss.

Or because the error is reported out of order and there are ordering
guarantees in the fs. SCSI is ok here other controllers are not always
right.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Question] Does the kernel ignore errors writng to disk?
  2005-04-28 22:43     ` Alan Cox
@ 2005-04-28 23:14       ` Bryan Henderson
  2005-04-29  7:25         ` Anton Altaparmakov
  0 siblings, 1 reply; 15+ messages in thread
From: Bryan Henderson @ 2005-04-28 23:14 UTC (permalink / raw)
  To: Alan Cox; +Cc: brace, Linux Kernel Mailing List, linux-scsi, mike.miller

>O_SYNC doesn't work completely on several file systems and only on the
>latest kernels with some of the common ones.

Hmmm.  You didn't mention such a restriction when you suggested fsync() 
before.  Does fsync() work completely on these kernels where O_SYNC 
doesn't?  Considering that a simple implementation of O_SYNC just does the 
equivalent of an fsync() inside every write(), that would be hard to 
understand.

--
Bryan Henderson                          IBM Almaden Research Center
San Jose CA                              Filesystems


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Question] Does the kernel ignore errors writng to disk?
  2005-04-28 14:58 ` Alan Cox
  2005-04-28 18:14   ` Bryan Henderson
@ 2005-04-28 23:22   ` Bartlomiej Zolnierkiewicz
  2005-04-28 23:50     ` Alan Cox
  1 sibling, 1 reply; 15+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2005-04-28 23:22 UTC (permalink / raw)
  To: Alan Cox; +Cc: mike.miller, Linux Kernel Mailing List, linux-scsi, brace

On 4/28/05, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> On Mer, 2005-04-27 at 19:40, mike.miller@hp.com wrote:
> > It looks like the OS/filesystem (ext2/3 and reiserfs) does not wait for for a successful completion. Is this assumption correct?
> 
> Of course it doesn't. At 250 ops/second for a decent disk no OS waits
> for completions, all batch and asynchronously queue I/O. See man fsync
> and also O_DIRECT if you need specific "to disk" support. If you do that
> be aware that you must also turn write caching off on the IDE disk. I've
> repeatedly asked the "maintainer" of the IDE layer to do this
> automatically but gave up bothering long ago. Without that setting users

WTF is wrong with you Alan?

We agreed on this but it is you to do coding, if you want it,
not me (and there was never any patch from you).

It is not my (unpaid) job to fulfill any requirement you come up with.

BTW I was supposed to push git update today but I wasted this time 
on replying your complaints (didn't even bother with personal insults). 

> are playing with fire quite honestly.
> 
> The alternative with latest 2.6 stuff is to turn on Jens Axboe's barrier
> work which seems to give better performance on a drive new enough to
> have cache flush operations.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Question] Does the kernel ignore errors writng to disk?
  2005-04-28 23:22   ` Bartlomiej Zolnierkiewicz
@ 2005-04-28 23:50     ` Alan Cox
  2005-04-29  0:33       ` Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 15+ messages in thread
From: Alan Cox @ 2005-04-28 23:50 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: mike.miller, Linux Kernel Mailing List, linux-scsi, brace

> We agreed on this but it is you to do coding, if you want it,
> not me (and there was never any patch from you).

I gave up sending you patches because they never got applied and all I
got was "change this" or send a security fix and get told its got wrong
white spacing for your personal religion.

The bug is still there, and the users still need to know its dangerous.
Perhaps that way someone will fix it. 

Alan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Question] Does the kernel ignore errors writng to disk?
  2005-04-28 23:50     ` Alan Cox
@ 2005-04-29  0:33       ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 15+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2005-04-29  0:33 UTC (permalink / raw)
  To: Alan Cox; +Cc: mike.miller, Linux Kernel Mailing List, linux-scsi, brace

On 4/29/05, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> > We agreed on this but it is you to do coding, if you want it,
> > not me (and there was never any patch from you).
> 
> I gave up sending you patches because they never got applied and all I
> got was "change this" or send a security fix and get told its got wrong

First to make it clear you never ever sent any patch 
for this _particular_ issue.

Oh and you've never changed "this" or even explained why is so
so no wonder why _some_ of your patches don't get applied.

> white spacing for your personal religion.

Sure I complain about your exotic whitespace and coding
style but I _never_ reject patches because of this.

> The bug is still there, and the users still need to know its dangerous.
> Perhaps that way someone will fix it.

Patches as usual are welcomed.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Question] Does the kernel ignore errors writng to disk?
  2005-04-28 23:14       ` Bryan Henderson
@ 2005-04-29  7:25         ` Anton Altaparmakov
  2005-04-29 19:11           ` Bryan Henderson
  0 siblings, 1 reply; 15+ messages in thread
From: Anton Altaparmakov @ 2005-04-29  7:25 UTC (permalink / raw)
  To: Bryan Henderson
  Cc: Alan Cox, brace, Linux Kernel Mailing List, linux-scsi, mike.miller

On Thu, 28 Apr 2005, Bryan Henderson wrote:
> >O_SYNC doesn't work completely on several file systems and only on the
> >latest kernels with some of the common ones.
> 
> Hmmm.  You didn't mention such a restriction when you suggested fsync() 
> before.  Does fsync() work completely on these kernels where O_SYNC 
> doesn't?  Considering that a simple implementation of O_SYNC just does the 
> equivalent of an fsync() inside every write(), that would be hard to 
> understand.

Some file systems implement their fsync() function as "return 0;" so no, 
you cannot rely on it at all.

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Question] Does the kernel ignore errors writng to disk?
  2005-04-29  7:25         ` Anton Altaparmakov
@ 2005-04-29 19:11           ` Bryan Henderson
  2005-04-29 22:00             ` Alan Cox
  0 siblings, 1 reply; 15+ messages in thread
From: Bryan Henderson @ 2005-04-29 19:11 UTC (permalink / raw)
  To: Anton Altaparmakov
  Cc: aia21, Alan Cox, Linux Kernel Mailing List, linux-scsi, mike.miller

>On Thu, 28 Apr 2005, Bryan Henderson wrote:
>> >O_SYNC doesn't work completely on several file systems and only on the
>> >latest kernels with some of the common ones.
>> 
>> Hmmm.  You didn't mention such a restriction when you suggested fsync() 

>> before.  Does fsync() work completely on these kernels where O_SYNC 
>> doesn't?  Considering that a simple implementation of O_SYNC just does 
the 
>> equivalent of an fsync() inside every write(), that would be hard to 
>> understand.
>
>Some file systems implement their fsync() function as "return 0;" so no, 
>you cannot rely on it at all.

It's pretty clear Alan isn't talking about those cases.  I don't think he 
would have suggested fsync() to address the delayed write error problem in 
a case where fsync() is "return 0;".

But let's talk about the no-op fsync() cases:  fsync() is supposed to 
cause data to be written to stable storage.  "stable" is a relative 
concept that the individual filesystem type or driver has to define for 
itself.  In an ordinary disk-based filesystem, we usually expect it to 
mean the data has gone onto the oxide.  But that's not really stable -- 
the disk drive could break and the data would be gone.  For some, just 
getting into the buffers of the disk drive is stable enough, since then 
rebooting Linux wouldn't cause the data to be lost.  For ramfs, the Linux 
page cache is as stable as you can hope for.

So I view it as correct even if fsync() does nothing on a disk-based 
filesystem because the programmer was lazy (or because the user wants to 
defeat the performance-busting behavior of some paranoid application). But 
when Alan speaks of a "not completely correct" version of synchronization, 
which makes me think of something that doesn't implement any consistent 
form of "stable," I want to hear more.

--
Bryan Henderson                          IBM Almaden Research Center
San Jose CA                              Filesystems

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Question] Does the kernel ignore errors writng to disk?
  2005-04-29 19:11           ` Bryan Henderson
@ 2005-04-29 22:00             ` Alan Cox
  2005-04-30  0:41               ` Bryan Henderson
  2005-05-01  9:01               ` Mogens Valentin
  0 siblings, 2 replies; 15+ messages in thread
From: Alan Cox @ 2005-04-29 22:00 UTC (permalink / raw)
  To: Bryan Henderson
  Cc: Anton Altaparmakov, aia21, Linux Kernel Mailing List, linux-scsi,
	mike.miller

On Gwe, 2005-04-29 at 20:11, Bryan Henderson wrote:
> So I view it as correct even if fsync() does nothing on a disk-based 
> filesystem because the programmer was lazy (or because the user wants to 
> defeat the performance-busting behavior of some paranoid application). But 
> when Alan speaks of a "not completely correct" version of synchronization, 
> which makes me think of something that doesn't implement any consistent 
> form of "stable," I want to hear more.

On the main fs's people use with a current kernel fsync guarantees the
data went somewhere. What it guarantees beyond that depends on the fs
properties, the driver properties and the media properties.

So ext3 journal=data or jffs which are the strongest guarantee cases
mean that your fsync() data should be on media and stable. Ditto I
believe default ext3 behaviour because fsync has stronger rules than
fdatasync.

The next question is what the I/O device does with the data. SCSI disks
will cache but the scsi layer uses tags and if neccessary turns the
cache off on the drive. In other words you should get that behaviour
correctly on SCSI media.

The default IDE behaviour doesn't turn write cache off and the IDE
device may re-order writes and ack them before they hit storage. IDE
lacks tags, and tends to have poor performance on cache flush commands.
With the barrier support on the right thing should occur, or with hdparm
used to turn the write cache off.

Raid controllers will cache data in their writeback caches, they will
also write and rewrite stripes which can mean a critical failure loses
the cache or involves a whole stripe loss, but that is very unlikely in
most modes. The good ones either write through or have battery backed
caches. The really good ones even let you put the battery/ram unit onto
another card.

Underlying all of this is the fact that disks aren't really disks any
more but NAS devices on funky cables, that can mean you can lose blocks
to drive faults that might not be the block you are currently writing.

Alan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Question] Does the kernel ignore errors writng to disk?
  2005-04-29 22:00             ` Alan Cox
@ 2005-04-30  0:41               ` Bryan Henderson
  2005-05-01  9:01               ` Mogens Valentin
  1 sibling, 0 replies; 15+ messages in thread
From: Bryan Henderson @ 2005-04-30  0:41 UTC (permalink / raw)
  To: Alan Cox
  Cc: aia21, Anton Altaparmakov, Linux Kernel Mailing List, linux-scsi,
	linux-scsi-owner, mike.miller

Thanks for the info on how stability works with SCSI and ATA, but I think 
you lost the context of my question.

You said earlier that fsync() and O_DIRECT are ways to deal with the 
problem of delayed write errors.  I added that O_SYNC is another way.  You 
then said that O_SYNC doesn't work completely correctly in some recent 
(but not current) kernels.  You didn't say the same about fsync().

I'd like to know if you mean to say that O_SYNC has some problems in some 
kernels that fsync() does not have.

And if it isn't too much trouble, it would be nice to hear details of how 
O_SYNC is partially correct in some kernels.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Question] Does the kernel ignore errors writng to disk?
  2005-04-29 22:00             ` Alan Cox
  2005-04-30  0:41               ` Bryan Henderson
@ 2005-05-01  9:01               ` Mogens Valentin
  1 sibling, 0 replies; 15+ messages in thread
From: Mogens Valentin @ 2005-05-01  9:01 UTC (permalink / raw)
  To: Alan Cox
  Cc: Bryan Henderson, Anton Altaparmakov, aia21,
	Linux Kernel Mailing List, linux-scsi, mike.miller

Alan Cox wrote:
> The next question is what the I/O device does with the data. SCSI disks
> will cache but the scsi layer uses tags and if neccessary turns the
> cache off on the drive. In other words you should get that behaviour
> correctly on SCSI media.
> 
> The default IDE behaviour doesn't turn write cache off and the IDE
> device may re-order writes and ack them before they hit storage. IDE
> lacks tags, and tends to have poor performance on cache flush commands.
> With the barrier support on the right thing should occur, or with hdparm
> used to turn the write cache off.

Is this IDE behaviour confined to IDE drives only?
SATA, when using libata, will solemnly be part of the SCSI chain, and 
hense not subject to your mentioned write cache problem, right?

-- 
Kind regards,
Mogens Valentin


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [Question] Does the kernel ignore errors writng to disk?
@ 2005-04-28 15:05 Miller, Mike (OS Dev)
  0 siblings, 0 replies; 15+ messages in thread
From: Miller, Mike (OS Dev) @ 2005-04-28 15:05 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List, linux-scsi, brace

> -----Original Message-----
> From: Alan Cox [mailto:alan@lxorguk.ukuu.org.uk] 
> Sent: Thursday, April 28, 2005 9:58 AM
> To: Miller, Mike (OS Dev)
> Cc: Linux Kernel Mailing List; linux-scsi@vger.kernel.org; 
> brace@hp.com
> Subject: Re: [Question] Does the kernel ignore errors writng to disk?
> 
> On Mer, 2005-04-27 at 19:40, mike.miller@hp.com wrote:
> > It looks like the OS/filesystem (ext2/3 and reiserfs) does 
> not wait for for a successful completion. Is this assumption correct?
> 
> Of course it doesn't. At 250 ops/second for a decent disk no 
> OS waits for completions, all batch and asynchronously queue 
> I/O. See man fsync and also O_DIRECT if you need specific "to 
> disk" support. If you do that be aware that you must also 
> turn write caching off on the IDE disk. I've repeatedly asked 
> the "maintainer" of the IDE layer to do this automatically 
> but gave up bothering long ago. Without that setting users 
> are playing with fire quite honestly.
> 
> The alternative with latest 2.6 stuff is to turn on Jens 
> Axboe's barrier work which seems to give better performance 
> on a drive new enough to have cache flush operations.
> 
> Alan
Thanks, Alan. I'll try Jens barrier.

> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2005-05-01  9:00 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-04-27 18:40 [Question] Does the kernel ignore errors writng to disk? mike.miller
2005-04-27 19:12 ` Richard B. Johnson
2005-04-28 14:58 ` Alan Cox
2005-04-28 18:14   ` Bryan Henderson
2005-04-28 22:43     ` Alan Cox
2005-04-28 23:14       ` Bryan Henderson
2005-04-29  7:25         ` Anton Altaparmakov
2005-04-29 19:11           ` Bryan Henderson
2005-04-29 22:00             ` Alan Cox
2005-04-30  0:41               ` Bryan Henderson
2005-05-01  9:01               ` Mogens Valentin
2005-04-28 23:22   ` Bartlomiej Zolnierkiewicz
2005-04-28 23:50     ` Alan Cox
2005-04-29  0:33       ` Bartlomiej Zolnierkiewicz
2005-04-28 15:05 Miller, Mike (OS Dev)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).