All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Fernando Luis Vázquez Cao" <fernando@oss.ntt.co.jp>
To: Dave Chinner <david@fromorbit.com>
Cc: Fernando Luis Vazquez Cao <fernando@kic.ac.jp>,
	Eric Sandeen <sandeen@redhat.com>, Jan Kara <jack@suse.cz>,
	Theodore Tso <tytso@MIT.EDU>, Alan Cox <alan@lxorguk.ukuu.org.uk>,
	Pavel Machek <pavel@suse.cz>,
	kernel list <linux-kernel@vger.kernel.org>,
	Jens Axboe <jens.axboe@oracle.com>,
	Ric Wheeler <rwheeler@redhat.com>
Subject: Re: vfs: Add MS_FLUSHONFSYNC mount flag
Date: Sat, 14 Feb 2009 22:03:53 +0900	[thread overview]
Message-ID: <1234616633.19783.91.camel@sebastian.kern.oss.ntt.co.jp> (raw)
In-Reply-To: <20090214112443.GY8830@disturbed>

On Sat, 2009-02-14 at 22:24 +1100, Dave Chinner wrote:
> On Sat, Feb 14, 2009 at 01:29:28AM +0900, Fernando Luis Vazquez Cao wrote:
> > On Fri, 2009-02-13 at 23:20 +1100, Dave Chinner wrote:
> > > On Fri, Feb 13, 2009 at 12:20:17AM -0600, Eric Sandeen wrote:
> > > > I'm just a little leery of the "dangerous" mount option proliferation, I
> > > > guess.
> > > 
> > > You're not the only one, Eric. It's bad enough having to explain to
> > > users what barriers do once they have lost data after a power loss,
> > > let alone confusing them further by adding more mount options they
> > > will get wrong by accident....
> > 
> > That is precisely the reason why we should use sensible defaults, which
> > in this case means enabling barriers and flushing disk caches on
> > fsync()/fdatasync() by default.
> > 
> > Adding either a new mount option (as you yourself suggest below) or a
> > sysfs tunable is desirable for those cases when we really do not need to
> > flush the disk write cache to guarantee integrity (battery-backed block
> > devices come to mind), or we want to be fast at the cost of potentially
> > losing some data.
> 
> Mount options are the wrong place for this. if you want to change
> the behaviour of the block device, then it should be at that level.

To be more precise, what we are trying to change is the behavior of
fsync()/fdatasync(), which users might want to change on a per-partition
basis. I guess this is the reason the barrier switch was made a mount
option, and I just wanted to be consistent.

My fear is that making one of them a mount option (barriers) and the
other a sysfs-tunable block device property (device flushes on fsync())
might end up creating more confusion than using a mount option for both.

Anyway, I do not have strong feelings on this issue and if there is
consensus I would be willing to change the patches so that flushonfsync
becomes a per block device tunable instead.

> > > Quite frankly, the VFS should do stuff that is slow and safe
> > > and filesystems can choose to ignore the VFS (via filesystem
> > > specific mount options) if they want to be fast and potentially
> > > unsafe.
> > 
> > To avoid unnecessary flushes and allow for filesystem-specific
> > optimizations I was considering the following approach:
> > 
> > 1- Add flushonfsync mount option (as an aside, I am of the opinion that
> > it should be set by default).
> 
> No mount option - too confusing for someone to work out what
> combination of barriers and flushing for things to work correctly.

As I suggested in a previous email, it is just a matter of using a safe
combination by default so that users do not need to figure out anything.

> Just make filesystems issue the necessary flush calls or barrier IOs

"ext3: call blkdev_issue_flush on fsync" and "ext4: call
blkdev_issue_flush on fsync" in this patch set implement just that for
ext3/4.

>  and allow the block devices to ignore flushes.

Wouldn't it make more sense to avoid sending bios down the block layer
which we can know in advance are going to be ignored by the block
device?

> > 2- Modify file_fsync() so that it checks whether FLUSHONFSYNC is set and
> > flushes the underlying device accordingly. With this we would cover all
> > filesystems that use the vfs-provided file_fsync() as their fsync method
> > (commonly used filesystems such as fat fall in this group).
> 
> Just make it flush the block device.

I wrote a patch that does exactly that but, in addition, it checks
whether FLUSHONFSYNC is set to avoid sending unnecessary flushes down
the block layer (this patch is not included in this patch-set, but I
will add it in the next iteration).

As I mentioned above, if everyone thinks this small optimization not
elegant or an undesirable layering violation I will remove it.

> > 3- Advanced filesystems (ext3/4, XFS, btrfs, etc) which provide their
> > own fsync implementations are allowed to perform filesystem-specific
> > optimizations there to minimize the number of flushes and maximize
> > throughput.
> 
> Um, you are describing what we already have in place.  Almost every
> filesystem provides it's own ->fsync method, not just the "advanced"
> ones.

Yes, I know. There are some remarkable exceptions such as fat, though.

>  It is those methods that need to be fixed to issue flushes, not just
> file_fsync().

Exactly, and this patch-set is my first attempt at that. For the first
submission I limited myself to fixing only ext3/4 so that I can get some
early feedback on my approach before moving forward.

> > In this patch set I implemented (1) and (3) for ext3/4 to have some code
> > to comment on.
> 
> I don't think we want (1) at all, and I thought that if ext3/4 are using
> barriers then the barrier I/O issued by the journal does the flush
> already. Hence (3) is redundant, right?

No, it is no redundant because a barrier is not issued in all cases. The
aforementioned two patches fix ext3/4 by emitting a device flush only
when necessary (i.e. when a barrier would not be emitted).

My impression is that we all agree in the basic approach, the only point
of contention being whether filesystems/VFS should be allowed to
optimize out disk flushes when the user told the kernel to do so (be it
via a sysfs tunable or mount option).

Cheers,

Fernando


  reply	other threads:[~2009-02-14 13:04 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-13 13:14 ext2 + -osync: not as easy as it seems Pavel Machek
2009-01-13 13:45 ` Alan Cox
2009-01-13 14:03   ` Theodore Tso
2009-01-13 14:07     ` Jens Axboe
2009-01-13 14:26       ` [PATCH] block: Fix documentation for blkdev_issue_flush() Theodore Ts'o
2009-01-13 14:28         ` Jens Axboe
2009-01-13 14:30     ` ext2 + -osync: not as easy as it seems Jan Kara
2009-01-13 14:46       ` Theodore Tso
2009-01-14  3:37       ` Fernando Luis Vázquez Cao
2009-01-14 10:35         ` Jan Kara
2009-01-14 13:21           ` Theodore Tso
2009-01-14 14:05             ` Jan Kara
2009-01-14 14:08               ` Jens Axboe
2009-01-14 14:34                 ` Theodore Tso
2009-01-14 14:43                   ` Jens Axboe
2009-02-12 16:43                 ` Eric Sandeen
2009-02-16 12:09                   ` Jens Axboe
2009-01-14 14:12               ` Theodore Tso
2009-01-14 14:37                 ` Jan Kara
2009-01-14 16:59                   ` Theodore Tso
2009-01-15 12:06                     ` Fernando Luis Vázquez Cao
2009-01-15 23:45                       ` Jan Kara
2009-01-16 12:31                         ` Fernando Luis Vázquez Cao
2009-01-16 13:55                           ` ext3: call blkdev_issue_flush on fsync Fernando Luis Vázquez Cao
2009-01-16 16:30                             ` Jan Kara
2009-01-17  9:47                               ` Fernando Luis Vázquez Cao
2009-01-17 10:00                                 ` Fernando Luis Vázquez Cao
2009-01-19 12:03                                   ` Jan Kara
2009-01-28  9:45                                     ` Fernando Luis Vázquez Cao
2009-01-28  9:55                                       ` Jan Kara
2009-02-12 10:33                                         ` Fernando Luis Vázquez Cao
2009-02-12 10:35                                           ` vfs: Improve readability off mount flag definitins by using offsets Fernando Luis Vázquez Cao
2009-02-12 10:36                                           ` vfs: Add MS_FLUSHONFSYNC mount flag Fernando Luis Vázquez Cao
2009-02-12 17:13                                             ` Eric Sandeen
2009-02-12 17:29                                               ` Jeff Garzik
2009-02-14 15:36                                                 ` Christoph Hellwig
2009-02-15  7:23                                                   ` Fernando Luis Vázquez Cao
2009-02-15 22:54                                                     ` Theodore Tso
2009-02-16  4:29                                                       ` Eric Sandeen
2009-02-16  7:47                                                       ` Fernando Luis Vázquez Cao
2009-02-16  7:47                                                         ` Fernando Luis Vázquez Cao
2009-02-12 21:23                                               ` Jan Kara
2009-02-12 21:30                                                 ` Eric Sandeen
2009-02-13  1:47                                                   ` Fernando Luis Vázquez Cao
2009-02-13  6:07                                                     ` Eric Sandeen
2009-02-13  2:23                                                   ` Theodore Tso
2009-02-22 14:15                                                     ` Pavel Machek
2009-02-22 20:51                                                       ` Eric Sandeen
2009-02-22 23:19                                                       ` Theodore Tso
2009-02-22 23:42                                                         ` Jeff Garzik
2009-02-22 23:46                                                           ` Jeff Garzik
2009-02-23  1:23                                                             ` Theodore Tso
2009-02-13  1:14                                               ` Fernando Luis Vázquez Cao
2009-02-13  6:20                                                 ` Eric Sandeen
2009-02-13 10:36                                                   ` Fernando Luis Vázquez Cao
2009-02-13 12:20                                                   ` Dave Chinner
2009-02-13 16:29                                                     ` Fernando Luis Vazquez Cao
2009-02-14 11:24                                                       ` Dave Chinner
2009-02-14 13:03                                                         ` Fernando Luis Vázquez Cao [this message]
2009-02-14 13:19                                                           ` Fernando Luis Vázquez Cao
2009-02-15  2:48                                                           ` Dave Chinner
2009-02-15  7:11                                                             ` Fernando Luis Vázquez Cao
2009-02-12 10:37                                           ` util-linux: Add new mount options flushonfsync and noflushonfsync to mount Fernando Luis Vázquez Cao
2009-02-12 10:38                                           ` util-linux: Add explanation for new mount options flushonfsync and noflushonfsync to mount(8) man page Fernando Luis Vázquez Cao
2009-02-12 10:38                                           ` block: Add block_flush_device() Fernando Luis Vázquez Cao
2009-02-12 10:39                                           ` ext3: call blkdev_issue_flush on fsync Fernando Luis Vázquez Cao
2009-02-12 10:40                                           ` ext4: " Fernando Luis Vázquez Cao
2009-02-15 22:46                                             ` Theodore Tso
2009-02-16  7:09                                               ` Fernando Luis Vázquez Cao
2009-02-16  7:25                                                 ` [PATCH 1/3] block: Add block_flush_device() Fernando Luis Vázquez Cao
2009-02-16  7:29                                                 ` [2/3] ext3: call block_flush_device() on fsync Fernando Luis Vázquez Cao
2009-02-16  7:31                                                 ` [PATCH 3/3] ext4: " Fernando Luis Vázquez Cao
2009-01-16 13:59                           ` ext4: call blkdev_issue_flush " Fernando Luis Vázquez Cao
2009-01-13 14:42   ` ext2 + -osync: not as easy as it seems Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1234616633.19783.91.camel@sebastian.kern.oss.ntt.co.jp \
    --to=fernando@oss.ntt.co.jp \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=david@fromorbit.com \
    --cc=fernando@kic.ac.jp \
    --cc=jack@suse.cz \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pavel@suse.cz \
    --cc=rwheeler@redhat.com \
    --cc=sandeen@redhat.com \
    --cc=tytso@MIT.EDU \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.