All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Fernando Luis Vázquez Cao" <fernando@oss.ntt.co.jp>
To: Jan Kara <jack@suse.cz>
Cc: Theodore Tso <tytso@MIT.EDU>, Alan Cox <alan@lxorguk.ukuu.org.uk>,
	Pavel Machek <pavel@suse.cz>,
	kernel list <linux-kernel@vger.kernel.org>,
	Jens Axboe <jens.axboe@oracle.com>,
	sandeen@redhat.com, fernando@kic.ac.jp
Subject: Re: ext3: call blkdev_issue_flush on fsync
Date: Wed, 28 Jan 2009 18:45:13 +0900	[thread overview]
Message-ID: <1233135913.5399.57.camel@sebastian.kern.oss.ntt.co.jp> (raw)
In-Reply-To: <20090119120349.GA10193@duck.suse.cz>

On Mon, 2009-01-19 at 13:03 +0100, Jan Kara wrote:
> On Sat 17-01-09 19:00:49, Fernando Luis Vázquez Cao wrote:
> > On Sat, 2009-01-17 at 18:47 +0900, Fernando Luis Vázquez Cao wrote:
> > > On Fri, 2009-01-16 at 17:30 +0100, Jan Kara wrote:
> > > > On Fri 16-01-09 22:55:01, Fernando Luis Vázquez Cao wrote:
> > > > > To ensure that bits are truly on-disk after an fsync or fdatasync, we
> > > > > should force a disk flush explicitly when there is dirty data/metadata
> > > > > and the journal didn't emit a write barrier (either because metadata is
> > > > > not being synched or barriers are disabled).
> > > > > 
> > > > > Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
> > > > > ---
> > > >   Only two minor nits:
> > > > 
> > > > > --- linux-2.6.29-rc1-orig/fs/ext3/fsync.c	2008-12-25 08:26:37.000000000 +0900
> > > > > +++ linux-2.6.29-rc1/fs/ext3/fsync.c	2009-01-16 22:18:53.000000000 +0900
> > > > > @@ -27,6 +27,7 @@
> > > > >  #include <linux/sched.h>
> > > > >  #include <linux/writeback.h>
> > > > >  #include <linux/jbd.h>
> > > > > +#include <linux/blkdev.h>
> > > > >  #include <linux/ext3_fs.h>
> > > > >  #include <linux/ext3_jbd.h>
> > > > >  
> > > > > @@ -45,6 +46,8 @@
> > > > >  int ext3_sync_file(struct file * file, struct dentry *dentry, int datasync)
> > > > >  {
> > > > >  	struct inode *inode = dentry->d_inode;
> > > > > +	journal_t *journal = EXT3_SB(inode->i_sb)->s_journal;
> > > > > +	unsigned long i_state = inode->i_state;
> > > > >  	int ret = 0;
> > > > >  
> > > > >  	J_ASSERT(ext3_journal_current_handle() == NULL);
> > > > > @@ -69,23 +72,33 @@ int ext3_sync_file(struct file * file, s
> > > > >  	 */
> > > > >  	if (ext3_should_journal_data(inode)) {
> > > > >  		ret = ext3_force_commit(inode->i_sb);
> > > > > +		if (!(journal->j_flags & JFS_BARRIER))
> > > > > +			goto no_journal_barrier;
> > > > >  		goto out;
> > > > >  	}
> > > > >  
> > > > > -	if (datasync && !(inode->i_state & I_DIRTY_DATASYNC))
> > > > > -		goto out;
> > > > > +	if (datasync && !(i_state & I_DIRTY_DATASYNC))
> > > > > +		goto flush_blkdev;
> > > > >  
> > > > >  	/*
> > > > >  	 * The VFS has written the file data.  If the inode is unaltered
> > > > >  	 * then we need not start a commit.
> > > > >  	 */
> > > > > -	if (inode->i_state & (I_DIRTY_SYNC|I_DIRTY_DATASYNC)) {
> > > > > +	if (i_state & (I_DIRTY_SYNC|I_DIRTY_DATASYNC)) {
> > > > >  		struct writeback_control wbc = {
> > > > >  			.sync_mode = WB_SYNC_ALL,
> > > > >  			.nr_to_write = 0, /* sys_fsync did this */
> > > > >  		};
> > > > >  		ret = sync_inode(inode, &wbc);
> > > > > +		if (journal && !(journal->j_flags & JFS_BARRIER))
> > > > > +			goto no_journal_barrier;
> > > >   I cannot imagine "journal" will be NULL here.
> > > 
> > > I'll try to check whether that is always so just in case.
> > > 
> > > >   And we can also optimize here a bit and do "goto out" because here
> > > > we know the barrier has been issued.
> > > 
> > > Yep, I was considering the same optimization. By the way, I was
> > > wondering if we should honor ext3 and ext4's "barrier" mount option for
> > > sys_fsync()/sys_fdatasync() and do not force a flush when "barrier=1".
> > 
> > The last phrase should read " do not force a flush when "barrier=0" ".
>   I was hesitating about this a bit. But I don't think so. The reason is
> that POSIX (or any other reasonable specification) mandates that fsync()
> should return only after the data is safely on storage. So if we don't
> flush blockdevice caches, we effectively violate POSIX and we should never
> do that. With barriers the matter is a bit different - that is just a
> filesystem specific thing, no standard guarantees anything.

Hi Jan,

Sorry it's taken me so long to get back to you.

Thinking a lit bit more about this issue, it occurred to me that adding
a new mount option à la existing "barrier" is likely to be preferable.
As an example where such an option could make sense, let's consider a
system with battery-backup cache devices. Since the battery-backup
guarantees the data still not committed to the platter will not vanish
in the event of a power down, it should be possible to obtain a
performance gain by optimizing out the device flush on every
fsync()/fdatasync() call.

If there is consensus on the propriety of this approach, I will send
updated patches.

Regards,

Fernando


  reply	other threads:[~2009-01-28  9:45 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-13 13:14 ext2 + -osync: not as easy as it seems Pavel Machek
2009-01-13 13:45 ` Alan Cox
2009-01-13 14:03   ` Theodore Tso
2009-01-13 14:07     ` Jens Axboe
2009-01-13 14:26       ` [PATCH] block: Fix documentation for blkdev_issue_flush() Theodore Ts'o
2009-01-13 14:28         ` Jens Axboe
2009-01-13 14:30     ` ext2 + -osync: not as easy as it seems Jan Kara
2009-01-13 14:46       ` Theodore Tso
2009-01-14  3:37       ` Fernando Luis Vázquez Cao
2009-01-14 10:35         ` Jan Kara
2009-01-14 13:21           ` Theodore Tso
2009-01-14 14:05             ` Jan Kara
2009-01-14 14:08               ` Jens Axboe
2009-01-14 14:34                 ` Theodore Tso
2009-01-14 14:43                   ` Jens Axboe
2009-02-12 16:43                 ` Eric Sandeen
2009-02-16 12:09                   ` Jens Axboe
2009-01-14 14:12               ` Theodore Tso
2009-01-14 14:37                 ` Jan Kara
2009-01-14 16:59                   ` Theodore Tso
2009-01-15 12:06                     ` Fernando Luis Vázquez Cao
2009-01-15 23:45                       ` Jan Kara
2009-01-16 12:31                         ` Fernando Luis Vázquez Cao
2009-01-16 13:55                           ` ext3: call blkdev_issue_flush on fsync Fernando Luis Vázquez Cao
2009-01-16 16:30                             ` Jan Kara
2009-01-17  9:47                               ` Fernando Luis Vázquez Cao
2009-01-17 10:00                                 ` Fernando Luis Vázquez Cao
2009-01-19 12:03                                   ` Jan Kara
2009-01-28  9:45                                     ` Fernando Luis Vázquez Cao [this message]
2009-01-28  9:55                                       ` Jan Kara
2009-02-12 10:33                                         ` Fernando Luis Vázquez Cao
2009-02-12 10:35                                           ` vfs: Improve readability off mount flag definitins by using offsets Fernando Luis Vázquez Cao
2009-02-12 10:36                                           ` vfs: Add MS_FLUSHONFSYNC mount flag Fernando Luis Vázquez Cao
2009-02-12 17:13                                             ` Eric Sandeen
2009-02-12 17:29                                               ` Jeff Garzik
2009-02-14 15:36                                                 ` Christoph Hellwig
2009-02-15  7:23                                                   ` Fernando Luis Vázquez Cao
2009-02-15 22:54                                                     ` Theodore Tso
2009-02-16  4:29                                                       ` Eric Sandeen
2009-02-16  7:47                                                       ` Fernando Luis Vázquez Cao
2009-02-16  7:47                                                         ` Fernando Luis Vázquez Cao
2009-02-12 21:23                                               ` Jan Kara
2009-02-12 21:30                                                 ` Eric Sandeen
2009-02-13  1:47                                                   ` Fernando Luis Vázquez Cao
2009-02-13  6:07                                                     ` Eric Sandeen
2009-02-13  2:23                                                   ` Theodore Tso
2009-02-22 14:15                                                     ` Pavel Machek
2009-02-22 20:51                                                       ` Eric Sandeen
2009-02-22 23:19                                                       ` Theodore Tso
2009-02-22 23:42                                                         ` Jeff Garzik
2009-02-22 23:46                                                           ` Jeff Garzik
2009-02-23  1:23                                                             ` Theodore Tso
2009-02-13  1:14                                               ` Fernando Luis Vázquez Cao
2009-02-13  6:20                                                 ` Eric Sandeen
2009-02-13 10:36                                                   ` Fernando Luis Vázquez Cao
2009-02-13 12:20                                                   ` Dave Chinner
2009-02-13 16:29                                                     ` Fernando Luis Vazquez Cao
2009-02-14 11:24                                                       ` Dave Chinner
2009-02-14 13:03                                                         ` Fernando Luis Vázquez Cao
2009-02-14 13:19                                                           ` Fernando Luis Vázquez Cao
2009-02-15  2:48                                                           ` Dave Chinner
2009-02-15  7:11                                                             ` Fernando Luis Vázquez Cao
2009-02-12 10:37                                           ` util-linux: Add new mount options flushonfsync and noflushonfsync to mount Fernando Luis Vázquez Cao
2009-02-12 10:38                                           ` util-linux: Add explanation for new mount options flushonfsync and noflushonfsync to mount(8) man page Fernando Luis Vázquez Cao
2009-02-12 10:38                                           ` block: Add block_flush_device() Fernando Luis Vázquez Cao
2009-02-12 10:39                                           ` ext3: call blkdev_issue_flush on fsync Fernando Luis Vázquez Cao
2009-02-12 10:40                                           ` ext4: " Fernando Luis Vázquez Cao
2009-02-15 22:46                                             ` Theodore Tso
2009-02-16  7:09                                               ` Fernando Luis Vázquez Cao
2009-02-16  7:25                                                 ` [PATCH 1/3] block: Add block_flush_device() Fernando Luis Vázquez Cao
2009-02-16  7:29                                                 ` [2/3] ext3: call block_flush_device() on fsync Fernando Luis Vázquez Cao
2009-02-16  7:31                                                 ` [PATCH 3/3] ext4: " Fernando Luis Vázquez Cao
2009-01-16 13:59                           ` ext4: call blkdev_issue_flush " Fernando Luis Vázquez Cao
2009-01-13 14:42   ` ext2 + -osync: not as easy as it seems Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1233135913.5399.57.camel@sebastian.kern.oss.ntt.co.jp \
    --to=fernando@oss.ntt.co.jp \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=fernando@kic.ac.jp \
    --cc=jack@suse.cz \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pavel@suse.cz \
    --cc=sandeen@redhat.com \
    --cc=tytso@MIT.EDU \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.