Linux-XFS Archive on
 help / color / Atom feed
From: Brian Foster <>
To: Christoph Hellwig <>
Cc:, Dave Chinner <>,
	Zorro Lang <>
Subject: Re: [PATCH] xfs: serialize unaligned dio writes against all other dio writes
Date: Mon, 25 Mar 2019 09:51:25 -0400
Message-ID: <20190325135124.GD52167@bfoster> (raw)
In-Reply-To: <>

On Mon, Mar 25, 2019 at 04:06:46AM -0700, Christoph Hellwig wrote:
> >  	if (unaligned_io) {
> > +		/* unaligned dio always waits, bail */
> > +		if (iocb->ki_flags & IOCB_NOWAIT)
> > +			return -EAGAIN;
> > +		else
> >  			inode_dio_wait(inode);
> No need for the else here.


> >  	} else if (iolock == XFS_IOLOCK_EXCL) {
> >  		xfs_ilock_demote(ip, XFS_IOLOCK_EXCL);
> >  		iolock = XFS_IOLOCK_SHARED;
> > @@ -548,6 +549,8 @@ xfs_file_dio_aio_write(
> >  
> >  	trace_xfs_file_direct_write(ip, count, iocb->ki_pos);
> >  	ret = iomap_dio_rw(iocb, from, &xfs_iomap_ops, xfs_dio_write_end_io);
> > +	if (unaligned_io && !is_sync_kiocb(iocb))
> > +		inode_dio_wait(inode);
> Instead of explicittly waiting here I'd much rather just mark the
> I/O as sync before submitting it.  The only thing needed for that
> is to clear iocb->ki_complete.  To avoid too much low-level hacking
> that is probably best done with a:
> static inline void mark_kiocb_sync(struct kiocb *kiocb)
> {
> 	kiocb->ki_complete = NULL;
> }
> helper in fs.h.

It's not quite that simple..

FWIW, the discussion (between Dave and I) for how best to solve this
started offline prior to sending the patch and pretty much started with
the idea of changing the async I/O to sync as you suggest here. I backed
off from that because it's too subtle given the semantics between the
higher level aio code and lower level dio code for async I/O. By that I
mean either can be responsible for calling the ->ki_complete() callback
in the iocb on I/O completion.

IOW, if we receive an async direct I/O, clear ->ki_complete() as you
describe above and submit it, the dio code will wait on I/O and return
the size of the I/O on successful completion. It will not have called
->ki_complete(), however. Rather, the >0 return value indicates that
aio_rw_done() must call ->ki_complete() after xfs_file_write_iter()
returns, but we would have already cleared the function pointer.

I think it is technically possible to use this technique by clearing and
restoring ->ki_complete(), but in general we've visited this "change the
I/O type" approach twice now and we've (collectively) got it wrong both
times (the first error in thinking was that XFS would need to call
->ki_complete()). IMO, this demonstrates that it's not worth the
complexity to insert ourselves into this dependency chain when we can
accomplish the same thing with a simple dio wait call.


      reply index

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-22 16:52 Brian Foster
2019-03-22 20:46 ` Allison Henderson
2019-03-23 10:29 ` Zorro Lang
2019-03-25  3:47   ` Zorro Lang
2019-03-25 13:45     ` Brian Foster
2019-03-24 20:59 ` Dave Chinner
2019-03-25 13:48   ` Brian Foster
2019-03-25 11:06 ` Christoph Hellwig
2019-03-25 13:51   ` Brian Foster [this message]

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190325135124.GD52167@bfoster \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-XFS Archive on

Archives are clonable:
	git clone --mirror linux-xfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-xfs linux-xfs/ \
	public-inbox-index linux-xfs

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone