All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Fernando Luis Vázquez Cao" <fernando@oss.ntt.co.jp>
To: Dave Chinner <david@fromorbit.com>
Cc: Fernando Luis Vazquez Cao <fernando@kic.ac.jp>,
	Eric Sandeen <sandeen@redhat.com>, Jan Kara <jack@suse.cz>,
	Theodore Tso <tytso@MIT.EDU>, Alan Cox <alan@lxorguk.ukuu.org.uk>,
	Pavel Machek <pavel@suse.cz>,
	kernel list <linux-kernel@vger.kernel.org>,
	Jens Axboe <jens.axboe@oracle.com>,
	Ric Wheeler <rwheeler@redhat.com>
Subject: Re: vfs: Add MS_FLUSHONFSYNC mount flag
Date: Sun, 15 Feb 2009 16:11:50 +0900	[thread overview]
Message-ID: <1234681910.19783.207.camel@sebastian.kern.oss.ntt.co.jp> (raw)
In-Reply-To: <20090215024807.GZ8830@disturbed>

On Sun, 2009-02-15 at 13:48 +1100, Dave Chinner wrote:
> On Sat, Feb 14, 2009 at 10:03:53PM +0900, Fernando Luis Vázquez Cao wrote:
> > On Sat, 2009-02-14 at 22:24 +1100, Dave Chinner wrote:
> > > On Sat, Feb 14, 2009 at 01:29:28AM +0900, Fernando Luis Vazquez Cao wrote:
> > > > On Fri, 2009-02-13 at 23:20 +1100, Dave Chinner wrote:
> > > > > On Fri, Feb 13, 2009 at 12:20:17AM -0600, Eric Sandeen wrote:
> > > > > > I'm just a little leery of the "dangerous" mount option proliferation, I
> > > > > > guess.
> > > > > 
> > > > > You're not the only one, Eric. It's bad enough having to explain to
> > > > > users what barriers do once they have lost data after a power loss,
> > > > > let alone confusing them further by adding more mount options they
> > > > > will get wrong by accident....
> > > > 
> > > > That is precisely the reason why we should use sensible defaults, which
> > > > in this case means enabling barriers and flushing disk caches on
> > > > fsync()/fdatasync() by default.
> > > > 
> > > > Adding either a new mount option (as you yourself suggest below) or a
> > > > sysfs tunable is desirable for those cases when we really do not need to
> > > > flush the disk write cache to guarantee integrity (battery-backed block
> > > > devices come to mind), or we want to be fast at the cost of potentially
> > > > losing some data.
> > > 
> > > Mount options are the wrong place for this. if you want to change
> > > the behaviour of the block device, then it should be at that level.
> > 
> > To be more precise, what we are trying to change is the behavior of
> > fsync()/fdatasync(), which users might want to change on a per-partition
> > basis. I guess this is the reason the barrier switch was made a mount
> > option, and I just wanted to be consistent.
> 
> This has no place in the kernel. Use LD_PRELOAD to make fsync() a
> no-op.

The purpose of flushonfsync is not making fsync() a no-op and goes
beyond what we can currently achieve with LD_PRELOAD. For example, if we
send the data to disk but avoid flushing the block device's write cache
we can potentially improve I/O performance at the cost of compromising
data and filesystem integrity. This is a risk that those who play fast
and loose may want assume. By the way, sadly enough this is the way many
of the filesystems in Linus' tree behave. I just wanted to change this
situation by making all filesystems issue write-cache flushes by
default.

Some people suggested to leave a knob for those who wanted to revert to
the old behavior and I myself thought that it could make sense in some
cases so decided to add the tunable flushonsync.

If there is consensus flushonfsync should be a per-device tunable I am
more than willing to make it so. My goal is to fix all filesystem so
that they emit barriers and disk flushes when they should. flushonfsync
is just a nicety I added for those who, for whatever reason, still want
the old behavior.

For the next iteration of this patchset I will take out the contentious
bits and leave only the filesystem/VFS fixes so that we can move forward
while we discuss the propriety of adding a per-device or a
per-filesystem tunable such as flushonfsync to change the default (and
safe) behavior.

> > > No mount option - too confusing for someone to work out what
> > > combination of barriers and flushing for things to work correctly.
> > 
> > As I suggested in a previous email, it is just a matter of using a safe
> > combination by default so that users do not need to figure out anything.
> 
> Too many users think that they need to specify everything rather
> than rely on defaults...

Well that is their business. From my experience most admins in the field
do not stray from their enterprise-distro provided defaults.

> > > Just make filesystems issue the necessary flush calls or barrier IOs
> > 
> > "ext3: call blkdev_issue_flush on fsync" and "ext4: call
> > blkdev_issue_flush on fsync" in this patch set implement just that for
> > ext3/4.
> > 
> > >  and allow the block devices to ignore flushes.
> > 
> > Wouldn't it make more sense to avoid sending bios down the block layer
> > which we can know in advance are going to be ignored by the block
> > device?
> 
> As soon as the block layer reports EOPNOTSUPPORTED to a barrier IO,
> the filesystem will switch them off and not issue them anymore.

Yes, that certainly makes sense. But the point in discussion is whether
users should be allowed to switch them off (it arguably makes sense in
some scenarios). I am afraid that some users will not be happy if we do
not leave the door open for them to revert to the old behavior.

> > > I don't think we want (1) at all, and I thought that if ext3/4 are using
> > > barriers then the barrier I/O issued by the journal does the flush
> > > already. Hence (3) is redundant, right?
> > 
> > No, it is no redundant because a barrier is not issued in all cases. The
> > aforementioned two patches fix ext3/4 by emitting a device flush only
> > when necessary (i.e. when a barrier would not be emitted).
> 
> Then that is a filesystem fix, not something that requires VFS
> modifications or new mount options....

Yup, as mentioned above flushonfsync is just a nicety I added to the
second iteration of this patchset and is independent from the filesystem
fixes.

Regards,

Fernando


  reply	other threads:[~2009-02-15  7:12 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-13 13:14 ext2 + -osync: not as easy as it seems Pavel Machek
2009-01-13 13:45 ` Alan Cox
2009-01-13 14:03   ` Theodore Tso
2009-01-13 14:07     ` Jens Axboe
2009-01-13 14:26       ` [PATCH] block: Fix documentation for blkdev_issue_flush() Theodore Ts'o
2009-01-13 14:28         ` Jens Axboe
2009-01-13 14:30     ` ext2 + -osync: not as easy as it seems Jan Kara
2009-01-13 14:46       ` Theodore Tso
2009-01-14  3:37       ` Fernando Luis Vázquez Cao
2009-01-14 10:35         ` Jan Kara
2009-01-14 13:21           ` Theodore Tso
2009-01-14 14:05             ` Jan Kara
2009-01-14 14:08               ` Jens Axboe
2009-01-14 14:34                 ` Theodore Tso
2009-01-14 14:43                   ` Jens Axboe
2009-02-12 16:43                 ` Eric Sandeen
2009-02-16 12:09                   ` Jens Axboe
2009-01-14 14:12               ` Theodore Tso
2009-01-14 14:37                 ` Jan Kara
2009-01-14 16:59                   ` Theodore Tso
2009-01-15 12:06                     ` Fernando Luis Vázquez Cao
2009-01-15 23:45                       ` Jan Kara
2009-01-16 12:31                         ` Fernando Luis Vázquez Cao
2009-01-16 13:55                           ` ext3: call blkdev_issue_flush on fsync Fernando Luis Vázquez Cao
2009-01-16 16:30                             ` Jan Kara
2009-01-17  9:47                               ` Fernando Luis Vázquez Cao
2009-01-17 10:00                                 ` Fernando Luis Vázquez Cao
2009-01-19 12:03                                   ` Jan Kara
2009-01-28  9:45                                     ` Fernando Luis Vázquez Cao
2009-01-28  9:55                                       ` Jan Kara
2009-02-12 10:33                                         ` Fernando Luis Vázquez Cao
2009-02-12 10:35                                           ` vfs: Improve readability off mount flag definitins by using offsets Fernando Luis Vázquez Cao
2009-02-12 10:36                                           ` vfs: Add MS_FLUSHONFSYNC mount flag Fernando Luis Vázquez Cao
2009-02-12 17:13                                             ` Eric Sandeen
2009-02-12 17:29                                               ` Jeff Garzik
2009-02-14 15:36                                                 ` Christoph Hellwig
2009-02-15  7:23                                                   ` Fernando Luis Vázquez Cao
2009-02-15 22:54                                                     ` Theodore Tso
2009-02-16  4:29                                                       ` Eric Sandeen
2009-02-16  7:47                                                       ` Fernando Luis Vázquez Cao
2009-02-16  7:47                                                         ` Fernando Luis Vázquez Cao
2009-02-12 21:23                                               ` Jan Kara
2009-02-12 21:30                                                 ` Eric Sandeen
2009-02-13  1:47                                                   ` Fernando Luis Vázquez Cao
2009-02-13  6:07                                                     ` Eric Sandeen
2009-02-13  2:23                                                   ` Theodore Tso
2009-02-22 14:15                                                     ` Pavel Machek
2009-02-22 20:51                                                       ` Eric Sandeen
2009-02-22 23:19                                                       ` Theodore Tso
2009-02-22 23:42                                                         ` Jeff Garzik
2009-02-22 23:46                                                           ` Jeff Garzik
2009-02-23  1:23                                                             ` Theodore Tso
2009-02-13  1:14                                               ` Fernando Luis Vázquez Cao
2009-02-13  6:20                                                 ` Eric Sandeen
2009-02-13 10:36                                                   ` Fernando Luis Vázquez Cao
2009-02-13 12:20                                                   ` Dave Chinner
2009-02-13 16:29                                                     ` Fernando Luis Vazquez Cao
2009-02-14 11:24                                                       ` Dave Chinner
2009-02-14 13:03                                                         ` Fernando Luis Vázquez Cao
2009-02-14 13:19                                                           ` Fernando Luis Vázquez Cao
2009-02-15  2:48                                                           ` Dave Chinner
2009-02-15  7:11                                                             ` Fernando Luis Vázquez Cao [this message]
2009-02-12 10:37                                           ` util-linux: Add new mount options flushonfsync and noflushonfsync to mount Fernando Luis Vázquez Cao
2009-02-12 10:38                                           ` util-linux: Add explanation for new mount options flushonfsync and noflushonfsync to mount(8) man page Fernando Luis Vázquez Cao
2009-02-12 10:38                                           ` block: Add block_flush_device() Fernando Luis Vázquez Cao
2009-02-12 10:39                                           ` ext3: call blkdev_issue_flush on fsync Fernando Luis Vázquez Cao
2009-02-12 10:40                                           ` ext4: " Fernando Luis Vázquez Cao
2009-02-15 22:46                                             ` Theodore Tso
2009-02-16  7:09                                               ` Fernando Luis Vázquez Cao
2009-02-16  7:25                                                 ` [PATCH 1/3] block: Add block_flush_device() Fernando Luis Vázquez Cao
2009-02-16  7:29                                                 ` [2/3] ext3: call block_flush_device() on fsync Fernando Luis Vázquez Cao
2009-02-16  7:31                                                 ` [PATCH 3/3] ext4: " Fernando Luis Vázquez Cao
2009-01-16 13:59                           ` ext4: call blkdev_issue_flush " Fernando Luis Vázquez Cao
2009-01-13 14:42   ` ext2 + -osync: not as easy as it seems Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1234681910.19783.207.camel@sebastian.kern.oss.ntt.co.jp \
    --to=fernando@oss.ntt.co.jp \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=david@fromorbit.com \
    --cc=fernando@kic.ac.jp \
    --cc=jack@suse.cz \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pavel@suse.cz \
    --cc=rwheeler@redhat.com \
    --cc=sandeen@redhat.com \
    --cc=tytso@MIT.EDU \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.