All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org, xfs@oss.sgi.com,
	hch@infradead.org, Jeff Moyer <jmoyer@redhat.com>,
	Al Viro <viro@ZenIV.linux.org.uk>,
	linux-ext4@vger.kernel.org, Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH 1/2] direct-io: Implement generic deferred AIO completions
Date: Mon, 12 Aug 2013 18:14:55 +0200	[thread overview]
Message-ID: <20130812161455.GA19471@quack.suse.cz> (raw)
In-Reply-To: <20130716210027.GA9595@quack.suse.cz>

  Hi Dave,

  I remembered about this patch set and realized I didn't get reply from
you regarding the following question (see quoted email below for details):
Do you really need to defer completion of appending direct IO? Because
generic code makes sure appending direct IO isn't async and thus
dio_complete() -> xfs_end_io_direct_write() gets called directly from
do_blockdev_direct_IO(). I.e. from a normal context and not from interrupt.

I've already addressed rest of your comments so this is the only item that
is remaining.

On Tue 16-07-13 23:00:27, Jan Kara wrote:
> > > -static ssize_t dio_complete(struct dio *dio, loff_t offset, ssize_t ret, bool is_async)
> > > +static ssize_t dio_complete(struct dio *dio, loff_t offset, ssize_t ret,
> > > +		bool is_async)
> > >  {
> > >  	ssize_t transferred = 0;
> > >  
> > > @@ -258,19 +262,26 @@ static ssize_t dio_complete(struct dio *dio, loff_t offset, ssize_t ret, bool is
> > >  	if (ret == 0)
> > >  		ret = transferred;
> > >  
> > > -	if (dio->end_io && dio->result) {
> > > -		dio->end_io(dio->iocb, offset, transferred,
> > > -			    dio->private, ret, is_async);
> > > -	} else {
> > > -		inode_dio_done(dio->inode);
> > > -		if (is_async)
> > > -			aio_complete(dio->iocb, ret, 0);
> > > -	}
> > > +	if (dio->end_io && dio->result)
> > > +		dio->end_io(dio->iocb, offset, transferred, dio->private);
> > 
> > Ok, so by here we are assuming all filesystem IO completion
> > processing is completed.
>   Yes.
> 
> > > +
> > > +	inode_dio_done(dio->inode);
> > > +	if (is_async)
> > > +		aio_complete(dio->iocb, ret, 0);
> > >  
> > > +	kmem_cache_free(dio_cache, dio);
> > >  	return ret;
> > 
> > Hmmm. I started wonder if this is valid, because XFS is supposed to
> > be doing workqueue based IO completion for appending writes and they
> > are supposed to be run in a workqueue.
> > 
> > But, looking at the XFS code, there's actually a bug in the direct
> > IO completion code and we are not defering completion to a workqueue
> > like we should be for the append case.  This code in
> > xfs_finish_ioend() demonstrates the intent:
> > 
> >                 if (ioend->io_type == XFS_IO_UNWRITTEN)
> >                         queue_work(mp->m_unwritten_workqueue, &ioend->io_work);
> >                 else if (ioend->io_append_trans ||
> > >>>>>>                   (ioend->io_isdirect && xfs_ioend_is_append(ioend)))
> >                         queue_work(mp->m_data_workqueue, &ioend->io_work);
> > 
> > The problem is that we only ever call xfs_finish_ioend() if is_async
> > is true, and that will never be true for direct IO beyond the current
> > EOF. That's a bug, and is Bad(tm).
> > 
> > [ Interestingly, it turns out that dio->is_async is never set for
> > writes beyond EOF because of filesystems that update file sizes
> > before data IO completion occurs (stale data exposure issues). For
> > XFS, that can't happen and so dio->is_async could always be set.... ]
> > 
> > What this means is that there's a condition for work queue deferral
> > of DIO IO completion that this patch doesn't handle. Setting deferral
> > only on unwritten extents like this:
> > 
> > > @@ -1292,8 +1281,10 @@ __xfs_get_blocks(
> > >  		if (create || !ISUNWRITTEN(&imap))
> > >  			xfs_map_buffer(inode, bh_result, &imap, offset);
> > >  		if (create && ISUNWRITTEN(&imap)) {
> > > -			if (direct)
> > > +			if (direct) {
> > >  				bh_result->b_private = inode;
> > > +				set_buffer_defer_completion(bh_result);
> > > +			}
> > >  			set_buffer_unwritten(bh_result);
> > >  		}
> > >  	}
> > 
> > misses that case. I suspect Christoph's original patch predated the
> > changes to XFS that added transactional file size updates to IO
> > completion and so didn't take it into account.
>   OK, thanks for catching this. Doing the i_size check in
> _xfs_get_blocks() would be somewhat cumbersome I presume so I guess we can
> handle that case by adding __blockdev_direct_IO() flag to defer dio
> completion to a workqueue. XFS can then set the flag when it sees it needs
> and i_size update. OK?
> 
> > > @@ -1390,9 +1381,7 @@ xfs_end_io_direct_write(
> > >  	struct kiocb		*iocb,
> > >  	loff_t			offset,
> > >  	ssize_t			size,
> > > -	void			*private,
> > > -	int			ret,
> > > -	bool			is_async)
> > > +	void			*private)
> > >  {
> > >  	struct xfs_ioend	*ioend = iocb->private;
> > >  
> > > @@ -1414,17 +1403,10 @@ xfs_end_io_direct_write(
> > >  
> > >  	ioend->io_offset = offset;
> > >  	ioend->io_size = size;
> > > -	ioend->io_iocb = iocb;
> > > -	ioend->io_result = ret;
> > >  	if (private && size > 0)
> > >  		ioend->io_type = XFS_IO_UNWRITTEN;
> > >  
> > > -	if (is_async) {
> > > -		ioend->io_isasync = 1;
> > > -		xfs_finish_ioend(ioend);
> > > -	} else {
> > > -		xfs_finish_ioend_sync(ioend);
> > > -	}
> > > +	xfs_finish_ioend_sync(ioend);
> > >  }
> > 
> > As i mentioned, the existing code here is buggy. What we should be
> > doing here is:
> > 
> > 	if (is_async) {
> > 		ioend->io_isasync = 1;
> > 		xfs_finish_ioend(ioend);
> > 	} else if (xfs_ioend_is_append(ioend))) {
> > 		xfs_finish_ioend(ioend);
> > 	} else {
> > 		xfs_finish_ioend_sync(ioend);
> > 	}
>   Umm, I started to wonder why do you actually need to defer the completion
> for appending ioend. Because if DIO isn't async, dio_complete() won't be
> called from interrupt context but from __blockdev_direct_IO(). So it seems
> you can do everything you need directly there without deferring to a
> workqueue. But maybe there's some locking subtlety I'm missing.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>,
	xfs@oss.sgi.com, hch@infradead.org,
	Jeff Moyer <jmoyer@redhat.com>, Al Viro <viro@ZenIV.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH 1/2] direct-io: Implement generic deferred AIO completions
Date: Mon, 12 Aug 2013 18:14:55 +0200	[thread overview]
Message-ID: <20130812161455.GA19471@quack.suse.cz> (raw)
In-Reply-To: <20130716210027.GA9595@quack.suse.cz>

  Hi Dave,

  I remembered about this patch set and realized I didn't get reply from
you regarding the following question (see quoted email below for details):
Do you really need to defer completion of appending direct IO? Because
generic code makes sure appending direct IO isn't async and thus
dio_complete() -> xfs_end_io_direct_write() gets called directly from
do_blockdev_direct_IO(). I.e. from a normal context and not from interrupt.

I've already addressed rest of your comments so this is the only item that
is remaining.

On Tue 16-07-13 23:00:27, Jan Kara wrote:
> > > -static ssize_t dio_complete(struct dio *dio, loff_t offset, ssize_t ret, bool is_async)
> > > +static ssize_t dio_complete(struct dio *dio, loff_t offset, ssize_t ret,
> > > +		bool is_async)
> > >  {
> > >  	ssize_t transferred = 0;
> > >  
> > > @@ -258,19 +262,26 @@ static ssize_t dio_complete(struct dio *dio, loff_t offset, ssize_t ret, bool is
> > >  	if (ret == 0)
> > >  		ret = transferred;
> > >  
> > > -	if (dio->end_io && dio->result) {
> > > -		dio->end_io(dio->iocb, offset, transferred,
> > > -			    dio->private, ret, is_async);
> > > -	} else {
> > > -		inode_dio_done(dio->inode);
> > > -		if (is_async)
> > > -			aio_complete(dio->iocb, ret, 0);
> > > -	}
> > > +	if (dio->end_io && dio->result)
> > > +		dio->end_io(dio->iocb, offset, transferred, dio->private);
> > 
> > Ok, so by here we are assuming all filesystem IO completion
> > processing is completed.
>   Yes.
> 
> > > +
> > > +	inode_dio_done(dio->inode);
> > > +	if (is_async)
> > > +		aio_complete(dio->iocb, ret, 0);
> > >  
> > > +	kmem_cache_free(dio_cache, dio);
> > >  	return ret;
> > 
> > Hmmm. I started wonder if this is valid, because XFS is supposed to
> > be doing workqueue based IO completion for appending writes and they
> > are supposed to be run in a workqueue.
> > 
> > But, looking at the XFS code, there's actually a bug in the direct
> > IO completion code and we are not defering completion to a workqueue
> > like we should be for the append case.  This code in
> > xfs_finish_ioend() demonstrates the intent:
> > 
> >                 if (ioend->io_type == XFS_IO_UNWRITTEN)
> >                         queue_work(mp->m_unwritten_workqueue, &ioend->io_work);
> >                 else if (ioend->io_append_trans ||
> > >>>>>>                   (ioend->io_isdirect && xfs_ioend_is_append(ioend)))
> >                         queue_work(mp->m_data_workqueue, &ioend->io_work);
> > 
> > The problem is that we only ever call xfs_finish_ioend() if is_async
> > is true, and that will never be true for direct IO beyond the current
> > EOF. That's a bug, and is Bad(tm).
> > 
> > [ Interestingly, it turns out that dio->is_async is never set for
> > writes beyond EOF because of filesystems that update file sizes
> > before data IO completion occurs (stale data exposure issues). For
> > XFS, that can't happen and so dio->is_async could always be set.... ]
> > 
> > What this means is that there's a condition for work queue deferral
> > of DIO IO completion that this patch doesn't handle. Setting deferral
> > only on unwritten extents like this:
> > 
> > > @@ -1292,8 +1281,10 @@ __xfs_get_blocks(
> > >  		if (create || !ISUNWRITTEN(&imap))
> > >  			xfs_map_buffer(inode, bh_result, &imap, offset);
> > >  		if (create && ISUNWRITTEN(&imap)) {
> > > -			if (direct)
> > > +			if (direct) {
> > >  				bh_result->b_private = inode;
> > > +				set_buffer_defer_completion(bh_result);
> > > +			}
> > >  			set_buffer_unwritten(bh_result);
> > >  		}
> > >  	}
> > 
> > misses that case. I suspect Christoph's original patch predated the
> > changes to XFS that added transactional file size updates to IO
> > completion and so didn't take it into account.
>   OK, thanks for catching this. Doing the i_size check in
> _xfs_get_blocks() would be somewhat cumbersome I presume so I guess we can
> handle that case by adding __blockdev_direct_IO() flag to defer dio
> completion to a workqueue. XFS can then set the flag when it sees it needs
> and i_size update. OK?
> 
> > > @@ -1390,9 +1381,7 @@ xfs_end_io_direct_write(
> > >  	struct kiocb		*iocb,
> > >  	loff_t			offset,
> > >  	ssize_t			size,
> > > -	void			*private,
> > > -	int			ret,
> > > -	bool			is_async)
> > > +	void			*private)
> > >  {
> > >  	struct xfs_ioend	*ioend = iocb->private;
> > >  
> > > @@ -1414,17 +1403,10 @@ xfs_end_io_direct_write(
> > >  
> > >  	ioend->io_offset = offset;
> > >  	ioend->io_size = size;
> > > -	ioend->io_iocb = iocb;
> > > -	ioend->io_result = ret;
> > >  	if (private && size > 0)
> > >  		ioend->io_type = XFS_IO_UNWRITTEN;
> > >  
> > > -	if (is_async) {
> > > -		ioend->io_isasync = 1;
> > > -		xfs_finish_ioend(ioend);
> > > -	} else {
> > > -		xfs_finish_ioend_sync(ioend);
> > > -	}
> > > +	xfs_finish_ioend_sync(ioend);
> > >  }
> > 
> > As i mentioned, the existing code here is buggy. What we should be
> > doing here is:
> > 
> > 	if (is_async) {
> > 		ioend->io_isasync = 1;
> > 		xfs_finish_ioend(ioend);
> > 	} else if (xfs_ioend_is_append(ioend))) {
> > 		xfs_finish_ioend(ioend);
> > 	} else {
> > 		xfs_finish_ioend_sync(ioend);
> > 	}
>   Umm, I started to wonder why do you actually need to defer the completion
> for appending ioend. Because if DIO isn't async, dio_complete() won't be
> called from interrupt context but from __blockdev_direct_IO(). So it seems
> you can do everything you need directly there without deferring to a
> workqueue. But maybe there's some locking subtlety I'm missing.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2013-08-12 16:14 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-10 22:02 [PATCH 1/2] direct-io: Implement generic deferred AIO completions Jan Kara
2013-07-10 22:02 ` Jan Kara
2013-07-10 22:02 ` [PATCH 2/2] direct-io: Handle O_(D)SYNC AIO Jan Kara
2013-07-10 22:02   ` Jan Kara
2013-07-12  0:44 ` [PATCH 1/2] direct-io: Implement generic deferred AIO completions Dave Chinner
2013-07-16 21:00   ` Jan Kara
2013-07-16 21:00     ` Jan Kara
2013-08-12 16:14     ` Jan Kara [this message]
2013-08-12 16:14       ` Jan Kara
2013-08-13  0:11       ` Dave Chinner
2013-08-13  0:11         ` Dave Chinner
2013-08-01  8:17   ` Christoph Hellwig
2013-08-14  9:10 [PATCH 0/2 v2] Fix O_SYNC AIO DIO Jan Kara
2013-08-14  9:10 ` [PATCH 1/2] direct-io: Implement generic deferred AIO completions Jan Kara
2013-09-04 13:04 [PATCH 0/2 v3] Fix O_SYNC AIO DIO Jan Kara
2013-09-04 13:04 ` [PATCH 1/2] direct-io: Implement generic deferred AIO completions Jan Kara
2013-09-04 13:04   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130812161455.GA19471@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=hch@lst.de \
    --cc=jmoyer@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@ZenIV.linux.org.uk \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.