linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hans Reiser <reiser@namesys.com>
To: Matthias Andree <matthias.andree@gmx.de>
Cc: Jens Axboe <axboe@suse.de>,
	Heikki Tuuri <Heikki.Tuuri@innodb.com>,
	linux-kernel@vger.kernel.org,
	Alexander Zarochentcev <zam@namesys.com>
Subject: Re: True  fsync() in Linux (on IDE)
Date: Mon, 22 Mar 2004 22:33:05 +0300	[thread overview]
Message-ID: <405F3F71.9090604@namesys.com> (raw)
In-Reply-To: <20040322151712.GB32519@merlin.emma.line.org>

Matthias Andree wrote:

>Jens Axboe schrieb am 2004-03-22:
>
>  
>
>>There's no such thing as atomic writes bigger than a sector really, we
>>just pretend there is. Timing usually makes this true.
>>    
>>
Can you explain about the timing?

>
>If there is no such atomicity (except maybe in ext3fs data=journal or
>the upcoming reiserfs4 - isn't there?), then nobody should claim so.
>
Well, nobody is going to use anything except reiser4 are they?;-).....

I think that we are able to guarantee that the write is fully atomic 
regardless of what the block layer does, so long as the block layer 
respects our ordering and does not cache it where it should not.

zam, you are watching this thread about flushing the ide cache I hope....

> If
>the kernel cannot 100.00000000% guarantee the write is atomic, claiming
>otherwise is plain fraud and nothing else.
>
>Some people bet their whole business/company and hence a fair deal of
>their belongings on a single data base, and making them believe facts
>that simply aren't reality is dangerous. These people will have very
>little understanding for sloppiness here. Linux has no obligation to be
>fast or reliable, but it MUST PROPERLY AND TRUTHFULLY state what it can
>guarantee and what it cannot guarantee.
>
>  
>
>>For bigger atomic writes, 2.4 SUSE kernel had some nasty hack (called
>>blk-atomic) to prevent reordering by the io scheduler to avoid partial
>>blocks from databases.
>>    
>>
>
>That does not make a write atomic if the scheduled blocks are still
>written one at a time (and I believe tagged command queueing won't help
>to unroll partial writes either).
>
>If the hardware support is missing, it is prudent to say just that and
>not make any bogus promises about platter inertia and "it usually
>works". (who says that the filter curves adjust to the decreasing
>platter speed and the electronics are sustained for long enough? how
>about write verify and remapping broken blocks?)
>
>So we only write one hardware block size atomically, usually 512 bytes
>on ATA and SCSI disk drives (MO might do 2048 at a time, but why
>introduce complexity).  That's a data point in this whole fsync()
>discussion.
>
>  
>


-- 
Hans


  parent reply	other threads:[~2004-03-22 19:33 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-03-22 13:08 True fsync() in Linux (on IDE) Heikki Tuuri
2004-03-22 13:23 ` Jens Axboe
2004-03-22 15:17   ` Matthias Andree
2004-03-22 15:35     ` Christoph Hellwig
2004-03-22 19:12     ` Christoffer Hall-Frederiksen
2004-03-22 20:28       ` Matthias Andree
2004-03-22 19:33     ` Hans Reiser [this message]
  -- strict thread matches above, loose matches on Subject: below --
2004-03-18  1:08 Peter Zaitsev
2004-03-18  6:47 ` Jens Axboe
2004-03-18 11:34   ` Matthias Andree
2004-03-18 11:55     ` Jens Axboe
2004-03-18 12:21       ` Matthias Andree
2004-03-18 12:37         ` Jens Axboe
2004-03-18 19:44   ` Peter Zaitsev
2004-03-18 19:47     ` Jens Axboe
2004-03-18 20:11       ` Chris Mason
2004-03-18 20:17         ` Peter Zaitsev
2004-03-18 20:33           ` Chris Mason
2004-03-18 20:46             ` Peter Zaitsev
2004-03-18 21:02               ` Chris Mason
2004-03-18 21:09                 ` Peter Zaitsev
2004-03-18 21:19                   ` Chris Mason
2004-03-19  8:05                     ` Hans Reiser
2004-03-19 13:52                       ` Chris Mason
2004-03-19 19:26                         ` Peter Zaitsev
2004-03-19 20:23                           ` Chris Mason
2004-03-19 20:31                             ` Hans Reiser
2004-03-19 20:38                               ` Chris Mason
2004-03-19 20:48                                 ` Hans Reiser
2004-03-19 20:56                                   ` Chris Mason
2004-03-20 11:04                                     ` Hans Reiser
2004-03-19 19:36                         ` Hans Reiser
2004-03-19 19:57                           ` Chris Mason
2004-03-19 20:04                             ` Hans Reiser
2004-03-19 20:15                               ` Chris Mason
2004-03-19 20:06                           ` Peter Zaitsev
2004-03-19 22:03                             ` Matthias Andree
2004-03-20 10:20                             ` Jamie Lokier
2004-03-20 19:48                               ` Peter Zaitsev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=405F3F71.9090604@namesys.com \
    --to=reiser@namesys.com \
    --cc=Heikki.Tuuri@innodb.com \
    --cc=axboe@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matthias.andree@gmx.de \
    --cc=zam@namesys.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).