linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>,
	Goldwyn Rodrigues <rgoldwyn@suse.com>,
	Josef Bacik <josef@toxicpanda.com>,
	Johannes Thumshirn <johannes.thumshirn@wdc.com>,
	David Sterba <dsterba@suse.com>,
	"linux-btrfs @ vger . kernel . org" <linux-btrfs@vger.kernel.org>,
	Filipe Manana <fdmanana@gmail.com>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>
Subject: Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context
Date: Thu, 17 Sep 2020 08:42:38 +0200	[thread overview]
Message-ID: <20200917064238.GA32441@lst.de> (raw)
In-Reply-To: <20200917062923.GV12096@dread.disaster.area>

On Thu, Sep 17, 2020 at 04:29:23PM +1000, Dave Chinner wrote:
> > inode_dio_wait really just waits for active I/O that writes to or reads
> > from the file.  It does not imply that the I/O is stable, just like
> > i_rwsem itself doesn't.
> 
> No, but iomap_dio_rw() considers a O_DSYNC write to be incomplete
> until it is stable so that it presents consistent behaviour to
> anythign calling inode_dio_wait().

But that point is that inode_dio_wait does not care about that
"consistency".  It cares about when the I/O is done.  I know because I
wrote it (and I regret that as we should have stuck with the non-owner
release of the rwsem which makes a whole lot more sense).

> 
> > Various file systems have historically called
> > the syncing outside i_rwsem and inode_dio_wait (in fact that is what the
> > fs/direct-io.c code does, so XFS did as well until a few years ago), and
> > that isn't a problem at all - we just can't return to userspace (or call
> > ki_complete for in-kernel users) before the data is stable on disk.
> 
> I'm really not caring about userspace here - we use inode_dio_wait()
> as an IO completion notification for the purposes of synchronising
> internal filesystem state before modifying user data via direct
> metadata manipulation. Hence I want sane, consistent, predictable IO
> completion notification behaviour regardless of the implementation
> path it goes through.

And none of that consistency matters.  Think of it:

 - an O_(D)SYNC write is nothing but a write plus a ranged fsync,
   even if we do some optimizations to speed up the fsync by e.g.
   using the FUA flag
 - another fsync can come up at any time after we completed a write
   (with or without O_SYNC)
 - so any synchronization using inode_dio_wait (or i_rwsem for that
   matter) must not care if an fsync runs in parallel.
 - take a look at where we call inode_dio_wait to verify this - the
   prime original use case was truncate as we can't have I/O in
   progress while trunating.  We then later extended it to all the
   truncate-like more compliated operations like hole punches, extent
   insert an collapse, etc.  But in all that cases what matters is
   the actual I/O, not the sync.  By having done direct I/O the
   page cache side of the sync doesn't matter to start with (but
   the callers all invalidate it anyway), so what matter is the metadata
   flush, aka the log force in the XFS case.  And for that we absolutely
   do not need to be before inode_dio_wait returns.

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
---end quoted text---

      reply	other threads:[~2020-09-17  6:42 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20200901130644.12655-1-johannes.thumshirn@wdc.com>
2020-09-01 15:11 ` [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context Josef Bacik
2020-09-01 17:45   ` Darrick J. Wong
2020-09-01 17:55     ` Josef Bacik
2020-09-01 21:46   ` Dave Chinner
2020-09-01 22:19     ` Josef Bacik
2020-09-01 23:58       ` Dave Chinner
2020-09-02  0:22         ` Josef Bacik
2020-09-02  7:12           ` Johannes Thumshirn
2020-09-02 11:10             ` Josef Bacik
2020-09-02 16:29               ` Darrick J. Wong
2020-09-02 16:47                 ` Josef Bacik
2020-09-02 11:44         ` Matthew Wilcox
2020-09-02 12:20           ` Dave Chinner
2020-09-02 12:42             ` Josef Bacik
2020-09-03  2:28               ` Dave Chinner
2020-09-03  9:49                 ` Filipe Manana
2020-09-03 16:32   ` Christoph Hellwig
2020-09-03 16:46     ` Josef Bacik
2020-09-07  0:04     ` Dave Chinner
2020-09-15 21:48       ` Goldwyn Rodrigues
2020-09-17  3:09         ` Dave Chinner
2020-09-17  5:52           ` Christoph Hellwig
2020-09-17  6:29             ` Dave Chinner
2020-09-17  6:42               ` Christoph Hellwig [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200917064238.GA32441@lst.de \
    --to=hch@lst.de \
    --cc=david@fromorbit.com \
    --cc=dsterba@suse.com \
    --cc=fdmanana@gmail.com \
    --cc=johannes.thumshirn@wdc.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=rgoldwyn@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).