linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zaitsev <peter@mysql.com>
To: reiser@namesys.com
Cc: Chris Mason <mason@suse.com>, Jens Axboe <axboe@suse.de>,
	Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: True  fsync() in Linux (on IDE)
Date: Fri, 19 Mar 2004 12:06:11 -0800	[thread overview]
Message-ID: <1079726769.2446.233.camel@abyss.local> (raw)
In-Reply-To: <405B4BA3.2030205@namesys.com>

On Fri, 2004-03-19 at 11:36, Hans Reiser wrote:

> mysql fsync()'s a file, which it thinks guarantees that all of a mysql 
> transaction has reached disk.  The disk write caches it.  You let fsync 
> return.  It is not on disk.  mysql performs its mysql commit, and writes 
> a mysql commit record which reaches disk, but not all of the transaction 
> is on disk.  The system crashes.  mysql plays the log.  mysql has 
> internal corruption.  User  calls Peter.  Peter asks, what do you expect 
> when you use a piece of shit like reiserfs?  User doesn't care about our 
> internal squabbling and goes back to using windows which does proper 
> commits.

This is right,

We had some unexplained data corruptions in Innodb which can be
explained by broken fsync(), but in the most cases the scenario is less
gloomy.  Users just do not see some of last committed transactions if
they test durability by shutting off the power, which is however already
not good enough for critical applications.

However this is due to external pre-caution Innodb does. It uses 
"double write buffer", which basically means each page is first written
to some small page based log file, and only afterwards written to the
proper place on the disk.   We have to do it even with proper fsync()
implementation as there is still possibility to crash in the middle of
fsync (or synchronous write) which will result in partial page write. 
Think for example about the case when page crosses stripe boundary on
RAID. 


If file system would guaranty atomicity of write() calls (synchronous
would be enough) we could disable it and get good extra performance.



-- 
Peter Zaitsev, Senior Support Engineer
MySQL AB, www.mysql.com

Meet the MySQL Team at User Conference 2004! (April 14-16, Orlando,FL)
  http://www.mysql.com/uc2004/


  parent reply	other threads:[~2004-03-19 20:07 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-03-18  1:08 True fsync() in Linux (on IDE) Peter Zaitsev
2004-03-18  6:47 ` Jens Axboe
2004-03-18 11:34   ` Matthias Andree
2004-03-18 11:55     ` Jens Axboe
2004-03-18 12:21       ` Matthias Andree
2004-03-18 12:37         ` Jens Axboe
2004-03-18 11:58     ` (no subject) Daniel Czarnecki
2004-03-18 19:44   ` True fsync() in Linux (on IDE) Peter Zaitsev
2004-03-18 19:47     ` Jens Axboe
2004-03-18 20:11       ` Chris Mason
2004-03-18 20:17         ` Peter Zaitsev
2004-03-18 20:33           ` Chris Mason
2004-03-18 20:46             ` Peter Zaitsev
2004-03-18 21:02               ` Chris Mason
2004-03-18 21:09                 ` Peter Zaitsev
2004-03-18 21:19                   ` Chris Mason
2004-03-19  8:05                     ` Hans Reiser
2004-03-19 13:52                       ` Chris Mason
2004-03-19 19:26                         ` Peter Zaitsev
2004-03-19 20:23                           ` Chris Mason
2004-03-19 20:31                             ` Hans Reiser
2004-03-19 20:38                               ` Chris Mason
2004-03-19 20:48                                 ` Hans Reiser
2004-03-19 20:56                                   ` Chris Mason
2004-03-20 11:04                                     ` Hans Reiser
2004-03-19 19:36                         ` Hans Reiser
2004-03-19 19:57                           ` Chris Mason
2004-03-19 20:04                             ` Hans Reiser
2004-03-19 20:15                               ` Chris Mason
2004-03-19 20:06                           ` Peter Zaitsev [this message]
2004-03-19 22:03                             ` Matthias Andree
2004-03-20 10:20                             ` Jamie Lokier
2004-03-20 19:48                               ` Peter Zaitsev
2004-03-22 13:08 Heikki Tuuri
2004-03-22 13:23 ` Jens Axboe
2004-03-22 15:17   ` Matthias Andree
2004-03-22 15:35     ` Christoph Hellwig
2004-03-22 19:12     ` Christoffer Hall-Frederiksen
2004-03-22 20:28       ` Matthias Andree
2004-03-22 19:33     ` Hans Reiser

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1079726769.2446.233.camel@abyss.local \
    --to=peter@mysql.com \
    --cc=axboe@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mason@suse.com \
    --cc=reiser@namesys.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).