linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bill Davidsen <davidsen@tmr.com>
To: Jamie Lokier <jamie@shareable.org>
Cc: linux-kernel@vger.kernel.org
Subject: Re: libata in 2.4.24?
Date: Sun, 7 Dec 2003 00:33:22 -0500 (EST)	[thread overview]
Message-ID: <Pine.LNX.3.96.1031207002609.7168A-100000@gatekeeper.tmr.com> (raw)
In-Reply-To: <20031203004736.GB27306@mail.shareable.org>

On Wed, 3 Dec 2003, Jamie Lokier wrote:

> bill davidsen wrote:
> > With O_SYNC files there is the possibility of having a don't cache bit
> > in the packet to the drive, even with write caching. With fsync I don't
> > see any way to do it after the fact for only some of the data in the
> > drive cache. That's just an observation.
> 
> With fsync, can't you write all the dirty pages with that bit set,
> write _again_ all the pages in RAM which are clean but which have
> never been written with the don't-cache bit, and read-then-write with
> the bit set all the pages which are not in RAM but which were dirtied
> and written without the don't cache bit set?

Actually, what I meant was to pass the bit to the drive, so that it would
not return completion status until physical completion. That can be done
for O_SYNC. But fsync() is after the fact, the o/s has no way of knowing
that the pages already sent to the drive have been written to media except
to do a cache flush and flush everything.

I don't think there's anything else you can do with fsync() with or
without the bit, since you may no longer have the buffers to send with the
bit set. Moreover, some drives, reportedly IBM, tend to botch a sevtor
being written during power fail, if I understand your proposal it *cold*
result in a buffer being sent to the drive twice, and a power fail during
the 2nd write could clobber the good data you wrote.

That's assuming I understand what you propose.

> 
> I know, it sounds a bit complicated :)
> 
> But would it work?

Maybe a guru will disagree, but I would say that just switching to what
SCSI does and caching the write on the drive and not returning done status
until it completes is the right solution. Maybe a drive vendor will do
that, maybe it's in SATA-2 spec, I was just making up the bit which
indicated a realtime write buffer, as a way O_SYNC could work without
killing performance.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


  reply	other threads:[~2003-12-07  5:44 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-11-28 18:27 linux-2.4.23 released Marcelo Tosatti
2003-11-28 19:06 ` Willy Tarreau
2003-11-28 22:55 ` J.A. Magallon
2003-11-29 22:26 ` libata in 2.4.24? Samuel Flory
2003-11-29 23:10   ` Marcelo Tosatti
2003-12-01 10:43     ` Marcelo Tosatti
2003-12-01 18:06       ` Samuel Flory
2003-12-01 21:12         ` Greg Stark
2003-12-01 21:23           ` Samuel Flory
2003-12-01 21:44             ` Greg Stark
2003-12-01 22:00               ` Jeff Garzik
2003-12-01 22:06               ` Samuel Flory
2003-12-01 22:00             ` Erik Steffl
2003-12-02  5:36               ` Greg Stark
     [not found]                 ` <20031202055336.GO1566@mis-mike-wstn.matchmail.com>
2003-12-02  5:58                   ` Mike Fedyk
2003-12-02 16:31                     ` Greg Stark
2003-12-02 17:40                       ` Mike Fedyk
2003-12-02 18:04                         ` Jeff Garzik
2003-12-02 18:46                           ` Mike Fedyk
2003-12-02 18:49                             ` Jeff Garzik
2003-12-04  8:18                         ` Jens Axboe
2003-12-02 18:02                       ` Jeff Garzik
2003-12-02 18:51                         ` Greg Stark
2003-12-02 19:06                           ` Jeff Garzik
2003-12-02 20:10                             ` Greg Stark
2003-12-02 20:16                               ` Jeff Garzik
2003-12-02 20:34                                 ` Greg Stark
2003-12-02 22:34                               ` bill davidsen
2003-12-02 23:02                                 ` Mike Fedyk
2003-12-02 23:18                                   ` bill davidsen
2003-12-02 23:40                                     ` Mike Fedyk
2003-12-03  0:01                                     ` Jeff Garzik
2003-12-03  0:47                                 ` Jamie Lokier
2003-12-07  5:33                                   ` Bill Davidsen [this message]
2003-12-01 21:36           ` Justin Cormack
2003-12-01 13:41 Xose Vazquez Perez
2003-12-01 14:11 ` Marcelo Tosatti
2003-12-02 19:59   ` Stephan von Krawczynski
2003-12-02 22:05   ` bill davidsen
2003-12-02 22:34     ` Jeff Garzik
2003-12-03  0:34 Xose Vazquez Perez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.3.96.1031207002609.7168A-100000@gatekeeper.tmr.com \
    --to=davidsen@tmr.com \
    --cc=jamie@shareable.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).