linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Garzik <jgarzik@pobox.com>
To: Greg Stark <gsstark@mit.edu>
Cc: Mike Fedyk <mfedyk@matchmail.com>,
	Erik Steffl <steffl@bigfoot.com>,
	linux-kernel@vger.kernel.org
Subject: Re: libata in 2.4.24?
Date: Tue, 2 Dec 2003 14:06:46 -0500	[thread overview]
Message-ID: <20031202190646.GA9043@gtf.org> (raw)
In-Reply-To: <87iskz9hp6.fsf@stark.dyndns.tv>

On Tue, Dec 02, 2003 at 01:51:17PM -0500, Greg Stark wrote:
> 
> Jeff Garzik <jgarzik@pobox.com> writes:
> 
> > If true, this is an IDE driver bug...  assuming the drive itself
> > doesn't lie about FLUSH CACHE results (a few do).
> 
> I don't think the IDE drivers issue FLUSH CACHE after every write on O_SYNC,
> or after fsync calls. The "lying" discussed on the database lists is when a
> normal write is issued, IDE disks report immediate success even before the
> write hits disk. As far as I know from the lists it seems *all* IDE disks
> behave this way unless write caching is disabled.

The way CONFIG_IDE (the traditional IDE driver) and libata work right
now, when the drive indicates that the read/write is complete, the OS
driver indicates to the filesystem that the data transaction is
complete.

So, today, no acknowledgement occurs until the data _really_ is in the
drive's buffers.

That said, "the database lists" may be seeing page cache effects.
write(2) will certainly report success long before the data transaction
is even sent to the driver!  You must fsync(2) to flush data from the
page cache to the IDE driver.


> This doesn't happen with SCSI disks where multiple requests can be pending so
> there's no urgency to reporting a false success. The request doesn't complete
> until the write hits disk. As a result SCSI disks are reliable for database
> operation and IDE disks aren't unless write caching is disabled.

This is not really true.

Regardless of TCQ, if the OS driver has not issued a FLUSH CACHE (IDE)
or SYNCHRONIZE CACHE (SCSI), then the data is not guaranteed to be on
the disk media.  Plain and simple.

If fsync(2) returns without a flush-cache, then your data is not
guaranteed to be on the disk.  And as you noted, flush-cache destroys
performance.


> I'm unclear on which of your #2 or #3 will be the solution though. Do either
> or both of them require that writes actually hit disk before the drive reports
> success? Do either of them allow that semantic without destroying concurrent
> performance?

There are three levels:

a) Data is successfully transferred to the controller/drive queue (TCQ).
b) Data is successfully transferred to the drive's internal buffers.
c) The drive successfully transfers data to the media.

Acknowledgement of (a) is basically instantaneous.  The OS driver simply
adds a drive read/write command to a list that the host controller can
see.

Acknowledgement of (b) happens fairly rapidly, limited by the device's
throughput and seek times, internal buffer load (amount of work todo),
and internal algorithms.

Acknowledgement of (c) _never_ occurs.  One must issue the flush-cache
drive command to be certain that the drive has flushed its write
buffers.

	Jeff




  reply	other threads:[~2003-12-02 19:13 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-11-28 18:27 linux-2.4.23 released Marcelo Tosatti
2003-11-28 19:06 ` Willy Tarreau
2003-11-28 22:55 ` J.A. Magallon
2003-11-29 22:26 ` libata in 2.4.24? Samuel Flory
2003-11-29 23:10   ` Marcelo Tosatti
2003-12-01 10:43     ` Marcelo Tosatti
2003-12-01 18:06       ` Samuel Flory
2003-12-01 21:12         ` Greg Stark
2003-12-01 21:23           ` Samuel Flory
2003-12-01 21:44             ` Greg Stark
2003-12-01 22:00               ` Jeff Garzik
2003-12-01 22:06               ` Samuel Flory
2003-12-01 22:00             ` Erik Steffl
2003-12-02  5:36               ` Greg Stark
     [not found]                 ` <20031202055336.GO1566@mis-mike-wstn.matchmail.com>
2003-12-02  5:58                   ` Mike Fedyk
2003-12-02 16:31                     ` Greg Stark
2003-12-02 17:40                       ` Mike Fedyk
2003-12-02 18:04                         ` Jeff Garzik
2003-12-02 18:46                           ` Mike Fedyk
2003-12-02 18:49                             ` Jeff Garzik
2003-12-04  8:18                         ` Jens Axboe
2003-12-02 18:02                       ` Jeff Garzik
2003-12-02 18:51                         ` Greg Stark
2003-12-02 19:06                           ` Jeff Garzik [this message]
2003-12-02 20:10                             ` Greg Stark
2003-12-02 20:16                               ` Jeff Garzik
2003-12-02 20:34                                 ` Greg Stark
2003-12-02 22:34                               ` bill davidsen
2003-12-02 23:02                                 ` Mike Fedyk
2003-12-02 23:18                                   ` bill davidsen
2003-12-02 23:40                                     ` Mike Fedyk
2003-12-03  0:01                                     ` Jeff Garzik
2003-12-03  0:47                                 ` Jamie Lokier
2003-12-07  5:33                                   ` Bill Davidsen
2003-12-01 21:36           ` Justin Cormack
2003-12-01 13:41 Xose Vazquez Perez
2003-12-01 14:11 ` Marcelo Tosatti
2003-12-02 19:59   ` Stephan von Krawczynski
2003-12-02 22:05   ` bill davidsen
2003-12-02 22:34     ` Jeff Garzik
2003-12-03  0:34 Xose Vazquez Perez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20031202190646.GA9043@gtf.org \
    --to=jgarzik@pobox.com \
    --cc=gsstark@mit.edu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mfedyk@matchmail.com \
    --cc=steffl@bigfoot.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).