All of lore.kernel.org
 help / color / mirror / Atom feed
* Query about DIO/AIO WRITE throttling and ext4 serialization
@ 2011-06-01 21:50 Vivek Goyal
  2011-06-02  1:22 ` Dave Chinner
  0 siblings, 1 reply; 17+ messages in thread
From: Vivek Goyal @ 2011-06-01 21:50 UTC (permalink / raw)
  To: linux-ext4

Hi,

If I throttle a DIO/AIO WRITE bio at block device in a cgroup, will it
lead to any kind of serialization of ext4 file system. IOW, is there any
filesystem operation which will wait for that DIO/AIO WRITE to finish
before other filesystem can make progress (fsync, journalling etc?)

I know that for throttling of buffered WRITES I do run into serialization
issues hence I was thinking of moving throttling buffered WRITE while
they are entering page cache and get rid of issues related to filesystem
serialization. I am not sure about DIO/AIO WRITES hence the question.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Query about DIO/AIO WRITE throttling and ext4 serialization
  2011-06-01 21:50 Query about DIO/AIO WRITE throttling and ext4 serialization Vivek Goyal
@ 2011-06-02  1:22 ` Dave Chinner
  2011-06-02 14:17   ` Vivek Goyal
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2011-06-02  1:22 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: linux-ext4

On Wed, Jun 01, 2011 at 05:50:49PM -0400, Vivek Goyal wrote:
> Hi,
> 
> If I throttle a DIO/AIO WRITE bio at block device in a cgroup, will it
> lead to any kind of serialization of ext4 file system. IOW, is there any
> filesystem operation which will wait for that DIO/AIO WRITE to finish
> before other filesystem can make progress (fsync, journalling etc?)

Truncate?

(XFS explicitly serialises truncate against in flight DIO,
regardless of whether ext4 does.)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Query about DIO/AIO WRITE throttling and ext4 serialization
  2011-06-02  1:22 ` Dave Chinner
@ 2011-06-02 14:17   ` Vivek Goyal
  2011-06-02 14:36     ` Vivek Goyal
  2011-06-02 23:46     ` Dave Chinner
  0 siblings, 2 replies; 17+ messages in thread
From: Vivek Goyal @ 2011-06-02 14:17 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-ext4

On Thu, Jun 02, 2011 at 11:22:09AM +1000, Dave Chinner wrote:
> On Wed, Jun 01, 2011 at 05:50:49PM -0400, Vivek Goyal wrote:
> > Hi,
> > 
> > If I throttle a DIO/AIO WRITE bio at block device in a cgroup, will it
> > lead to any kind of serialization of ext4 file system. IOW, is there any
> > filesystem operation which will wait for that DIO/AIO WRITE to finish
> > before other filesystem can make progress (fsync, journalling etc?)
> 
> Truncate?
> 
> (XFS explicitly serialises truncate against in flight DIO,
> regardless of whether ext4 does.)
> 

Dave,

Does this serialization happens against that particular inode on which
truncate has been called? If yes, then I think I will still be fine
as in common use case I am not expecting much sharing of inodes across
cgroups.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Query about DIO/AIO WRITE throttling and ext4 serialization
  2011-06-02 14:17   ` Vivek Goyal
@ 2011-06-02 14:36     ` Vivek Goyal
  2011-06-02 15:56       ` Vivek Goyal
  2011-06-02 23:46     ` Dave Chinner
  1 sibling, 1 reply; 17+ messages in thread
From: Vivek Goyal @ 2011-06-02 14:36 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-ext4

On Thu, Jun 02, 2011 at 10:17:16AM -0400, Vivek Goyal wrote:
> On Thu, Jun 02, 2011 at 11:22:09AM +1000, Dave Chinner wrote:
> > On Wed, Jun 01, 2011 at 05:50:49PM -0400, Vivek Goyal wrote:
> > > Hi,
> > > 
> > > If I throttle a DIO/AIO WRITE bio at block device in a cgroup, will it
> > > lead to any kind of serialization of ext4 file system. IOW, is there any
> > > filesystem operation which will wait for that DIO/AIO WRITE to finish
> > > before other filesystem can make progress (fsync, journalling etc?)
> > 
> > Truncate?
> > 
> > (XFS explicitly serialises truncate against in flight DIO,
> > regardless of whether ext4 does.)
> > 
> 
> Dave,
> 
> Does this serialization happens against that particular inode on which
> truncate has been called? If yes, then I think I will still be fine
> as in common use case I am not expecting much sharing of inodes across
> cgroups.

Dave,

I did a quick test of throttling a direct IO on one file and then
doing "truncate -s 40 testfile" on a different file in different
cgroup and it seems to work fine.

But I seem to be having issues with "sync". Looks like in ext4, if
I throttle a DIO, sync does not hang but in XFS it does. I am 
wondering if XFS is waiting for all inflight DIO to finish before
sync completes.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Query about DIO/AIO WRITE throttling and ext4 serialization
  2011-06-02 14:36     ` Vivek Goyal
@ 2011-06-02 15:56       ` Vivek Goyal
  2011-06-02 23:51         ` Dave Chinner
  0 siblings, 1 reply; 17+ messages in thread
From: Vivek Goyal @ 2011-06-02 15:56 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-ext4

On Thu, Jun 02, 2011 at 10:36:33AM -0400, Vivek Goyal wrote:
> On Thu, Jun 02, 2011 at 10:17:16AM -0400, Vivek Goyal wrote:
> > On Thu, Jun 02, 2011 at 11:22:09AM +1000, Dave Chinner wrote:
> > > On Wed, Jun 01, 2011 at 05:50:49PM -0400, Vivek Goyal wrote:
> > > > Hi,
> > > > 
> > > > If I throttle a DIO/AIO WRITE bio at block device in a cgroup, will it
> > > > lead to any kind of serialization of ext4 file system. IOW, is there any
> > > > filesystem operation which will wait for that DIO/AIO WRITE to finish
> > > > before other filesystem can make progress (fsync, journalling etc?)
> > > 
> > > Truncate?
> > > 
> > > (XFS explicitly serialises truncate against in flight DIO,
> > > regardless of whether ext4 does.)
> > > 
> > 
> > Dave,
> > 
> > Does this serialization happens against that particular inode on which
> > truncate has been called? If yes, then I think I will still be fine
> > as in common use case I am not expecting much sharing of inodes across
> > cgroups.
> 
> Dave,
> 
> I did a quick test of throttling a direct IO on one file and then
> doing "truncate -s 40 testfile" on a different file in different
> cgroup and it seems to work fine.
> 
> But I seem to be having issues with "sync". Looks like in ext4, if
> I throttle a DIO, sync does not hang but in XFS it does. I am 
> wondering if XFS is waiting for all inflight DIO to finish before
> sync completes.

"sync" on XFS seems to be livelocking as long as DIO write operation
is going on and same does not happen on ext4.

I ran "aio-stress -O aiofile1 -s 4G" and in other window I did "sync"
and it does not finish untile and unless aio-stress has finished.
On the other hand ext4 seems to be fine and it does finish earlier.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Query about DIO/AIO WRITE throttling and ext4 serialization
  2011-06-02 14:17   ` Vivek Goyal
  2011-06-02 14:36     ` Vivek Goyal
@ 2011-06-02 23:46     ` Dave Chinner
  1 sibling, 0 replies; 17+ messages in thread
From: Dave Chinner @ 2011-06-02 23:46 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: linux-ext4

On Thu, Jun 02, 2011 at 10:17:16AM -0400, Vivek Goyal wrote:
> On Thu, Jun 02, 2011 at 11:22:09AM +1000, Dave Chinner wrote:
> > On Wed, Jun 01, 2011 at 05:50:49PM -0400, Vivek Goyal wrote:
> > > Hi,
> > > 
> > > If I throttle a DIO/AIO WRITE bio at block device in a cgroup, will it
> > > lead to any kind of serialization of ext4 file system. IOW, is there any
> > > filesystem operation which will wait for that DIO/AIO WRITE to finish
> > > before other filesystem can make progress (fsync, journalling etc?)
> > 
> > Truncate?
> > 
> > (XFS explicitly serialises truncate against in flight DIO,
> > regardless of whether ext4 does.)
> > 
> 
> Dave,
> 
> Does this serialization happens against that particular inode on which
> truncate has been called? If yes, then I think I will still be fine
> as in common use case I am not expecting much sharing of inodes across
> cgroups.

Same inode serialisation only.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Query about DIO/AIO WRITE throttling and ext4 serialization
  2011-06-02 15:56       ` Vivek Goyal
@ 2011-06-02 23:51         ` Dave Chinner
  2011-06-03  0:27           ` Vivek Goyal
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2011-06-02 23:51 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: linux-ext4

On Thu, Jun 02, 2011 at 11:56:10AM -0400, Vivek Goyal wrote:
> On Thu, Jun 02, 2011 at 10:36:33AM -0400, Vivek Goyal wrote:
> > On Thu, Jun 02, 2011 at 10:17:16AM -0400, Vivek Goyal wrote:
> > > On Thu, Jun 02, 2011 at 11:22:09AM +1000, Dave Chinner wrote:
> > > > On Wed, Jun 01, 2011 at 05:50:49PM -0400, Vivek Goyal wrote:
> > > > > Hi,
> > > > > 
> > > > > If I throttle a DIO/AIO WRITE bio at block device in a cgroup, will it
> > > > > lead to any kind of serialization of ext4 file system. IOW, is there any
> > > > > filesystem operation which will wait for that DIO/AIO WRITE to finish
> > > > > before other filesystem can make progress (fsync, journalling etc?)
> > > > 
> > > > Truncate?
> > > > 
> > > > (XFS explicitly serialises truncate against in flight DIO,
> > > > regardless of whether ext4 does.)
> > > > 
> > > 
> > > Dave,
> > > 
> > > Does this serialization happens against that particular inode on which
> > > truncate has been called? If yes, then I think I will still be fine
> > > as in common use case I am not expecting much sharing of inodes across
> > > cgroups.
> > 
> > Dave,
> > 
> > I did a quick test of throttling a direct IO on one file and then
> > doing "truncate -s 40 testfile" on a different file in different
> > cgroup and it seems to work fine.
> > 
> > But I seem to be having issues with "sync". Looks like in ext4, if
> > I throttle a DIO, sync does not hang but in XFS it does. I am 
> > wondering if XFS is waiting for all inflight DIO to finish before
> > sync completes.
> 
> "sync" on XFS seems to be livelocking as long as DIO write operation
> is going on and same does not happen on ext4.
> 
> I ran "aio-stress -O aiofile1 -s 4G" and in other window I did "sync"
> and it does not finish untile and unless aio-stress has finished.
> On the other hand ext4 seems to be fine and it does finish earlier.

On XFS sync waits for the IO count on each inode to return to zero
before continuing.  If you are blasting concurrent AIO/DIO at a
file, then it is possible that the IO count never falls to zero.
It's questionable whether this is necessary, but ISTR that the
current behaviour has been there for a long time (though morphed
about a bit in implementation).

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Query about DIO/AIO WRITE throttling and ext4 serialization
  2011-06-02 23:51         ` Dave Chinner
@ 2011-06-03  0:27           ` Vivek Goyal
  2011-06-03  0:43             ` Ted Ts'o
  0 siblings, 1 reply; 17+ messages in thread
From: Vivek Goyal @ 2011-06-03  0:27 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-ext4

On Fri, Jun 03, 2011 at 09:51:53AM +1000, Dave Chinner wrote:

[..]
> > > Dave,
> > > 
> > > I did a quick test of throttling a direct IO on one file and then
> > > doing "truncate -s 40 testfile" on a different file in different
> > > cgroup and it seems to work fine.
> > > 
> > > But I seem to be having issues with "sync". Looks like in ext4, if
> > > I throttle a DIO, sync does not hang but in XFS it does. I am 
> > > wondering if XFS is waiting for all inflight DIO to finish before
> > > sync completes.
> > 
> > "sync" on XFS seems to be livelocking as long as DIO write operation
> > is going on and same does not happen on ext4.
> > 
> > I ran "aio-stress -O aiofile1 -s 4G" and in other window I did "sync"
> > and it does not finish untile and unless aio-stress has finished.
> > On the other hand ext4 seems to be fine and it does finish earlier.
> 
> On XFS sync waits for the IO count on each inode to return to zero
> before continuing.  If you are blasting concurrent AIO/DIO at a
> file, then it is possible that the IO count never falls to zero.
> It's questionable whether this is necessary, but ISTR that the
> current behaviour has been there for a long time (though morphed
> about a bit in implementation).

In this case only a single thread is doing IO continuously. I am assuming
if there is a database using XFS, it is not unreasonable to have prolonged
periods of continuous IO activity. In that case I think by above design
sync will not finish until and unless there is a momentary pause in IO. This
does not sound like the best design choice.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Query about DIO/AIO WRITE throttling and ext4 serialization
  2011-06-03  0:27           ` Vivek Goyal
@ 2011-06-03  0:43             ` Ted Ts'o
  2011-06-03  0:54               ` Vivek Goyal
  0 siblings, 1 reply; 17+ messages in thread
From: Ted Ts'o @ 2011-06-03  0:43 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Dave Chinner, linux-ext4

On Thu, Jun 02, 2011 at 08:27:14PM -0400, Vivek Goyal wrote:
> 
> In this case only a single thread is doing IO continuously. I am assuming
> if there is a database using XFS, it is not unreasonable to have prolonged
> periods of continuous IO activity. In that case I think by above design
> sync will not finish until and unless there is a momentary pause in IO. This
> does not sound like the best design choice.

Sure, but under what circumstances would a database be blasting data
using AIO/DIO in one thread, and calling fsync() in another thread?
In practice I don't think this situation should ever arise.  If it
did, the question of which writes could be considered safely on stable
store and which would not would be undefined.  In fact, for most
enterpise databases, they are using preallocated files, so there's no
need at all to use fsync() and AIO/DIO at the same time.

     	       	   	       	       	  - Ted

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Query about DIO/AIO WRITE throttling and ext4 serialization
  2011-06-03  0:43             ` Ted Ts'o
@ 2011-06-03  0:54               ` Vivek Goyal
  2011-06-03  1:02                 ` Christoph Hellwig
  2011-06-03  1:11                 ` Ted Ts'o
  0 siblings, 2 replies; 17+ messages in thread
From: Vivek Goyal @ 2011-06-03  0:54 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: Dave Chinner, linux-ext4

On Thu, Jun 02, 2011 at 08:43:00PM -0400, Ted Ts'o wrote:
> On Thu, Jun 02, 2011 at 08:27:14PM -0400, Vivek Goyal wrote:
> > 
> > In this case only a single thread is doing IO continuously. I am assuming
> > if there is a database using XFS, it is not unreasonable to have prolonged
> > periods of continuous IO activity. In that case I think by above design
> > sync will not finish until and unless there is a momentary pause in IO. This
> > does not sound like the best design choice.
> 
> Sure, but under what circumstances would a database be blasting data
> using AIO/DIO in one thread, and calling fsync() in another thread?
> In practice I don't think this situation should ever arise.  If it
> did, the question of which writes could be considered safely on stable
> store and which would not would be undefined.  In fact, for most
> enterpise databases, they are using preallocated files, so there's no
> need at all to use fsync() and AIO/DIO at the same time.

In this case I had done "sync" while aio-stress was doing O_DIRECT writes.
I really don't have any real world example, I just cooked up a hypothetical
scenario.

Just wondering why ext4 and XFS behavior are different and which is a
more appropriate behavior. ext4 does not seem to be waiting for all
pending AIO/DIO to finish while XFS does.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Query about DIO/AIO WRITE throttling and ext4 serialization
  2011-06-03  0:54               ` Vivek Goyal
@ 2011-06-03  1:02                 ` Christoph Hellwig
  2011-06-03  1:28                   ` Vivek Goyal
  2011-06-03  3:30                   ` Eric Sandeen
  2011-06-03  1:11                 ` Ted Ts'o
  1 sibling, 2 replies; 17+ messages in thread
From: Christoph Hellwig @ 2011-06-03  1:02 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Ted Ts'o, Dave Chinner, linux-ext4

On Thu, Jun 02, 2011 at 08:54:03PM -0400, Vivek Goyal wrote:
> Just wondering why ext4 and XFS behavior are different and which is a
> more appropriate behavior. ext4 does not seem to be waiting for all
> pending AIO/DIO to finish while XFS does.

They're both wrong.  Ext4 completely misses support in fsync or sync
to catch pending unwrittent extent conversions, and thus fails to obey
the data integrity guarante.  XFS is beeing rather stupid about the
amount of synchronization it requires.  The untested patch below
should help with avoiding the synchronization if you're purely doing
overwrites:


Index: xfs/fs/xfs/linux-2.6/xfs_aops.c
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_aops.c	2011-06-03 09:54:52.964337556 +0900
+++ xfs/fs/xfs/linux-2.6/xfs_aops.c	2011-06-03 09:57:06.877674259 +0900
@@ -270,7 +270,7 @@ xfs_finish_ioend_sync(
  * (vs. incore size).
  */
 STATIC xfs_ioend_t *
-xfs_alloc_ioend(
+__xfs_alloc_ioend(
 	struct inode		*inode,
 	unsigned int		type)
 {
@@ -290,7 +290,6 @@ xfs_alloc_ioend(
 	ioend->io_inode = inode;
 	ioend->io_buffer_head = NULL;
 	ioend->io_buffer_tail = NULL;
-	atomic_inc(&XFS_I(ioend->io_inode)->i_iocount);
 	ioend->io_offset = 0;
 	ioend->io_size = 0;
 	ioend->io_iocb = NULL;
@@ -300,6 +299,18 @@ xfs_alloc_ioend(
 	return ioend;
 }
 
+STATIC xfs_ioend_t *
+xfs_alloc_ioend(
+	struct inode		*inode,
+	unsigned int		type)
+{
+	struct xfs_ioend	*ioend;
+
+	ioend = __xfs_alloc_ioend(inode, type);
+	atomic_inc(&XFS_I(ioend->io_inode)->i_iocount);
+	return ioend;
+}
+
 STATIC int
 xfs_map_blocks(
 	struct inode		*inode,
@@ -1318,6 +1329,7 @@ xfs_end_io_direct_write(
 	 */
 	iocb->private = NULL;
 
+	atomic_inc(&XFS_I(ioend->io_inode)->i_iocount);
 	ioend->io_offset = offset;
 	ioend->io_size = size;
 	if (private && size > 0)
@@ -1354,7 +1366,7 @@ xfs_vm_direct_IO(
 	ssize_t			ret;
 
 	if (rw & WRITE) {
-		iocb->private = xfs_alloc_ioend(inode, IO_DIRECT);
+		iocb->private = __xfs_alloc_ioend(inode, IO_DIRECT);
 
 		ret = __blockdev_direct_IO(rw, iocb, inode, bdev, iov,
 					    offset, nr_segs,

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Query about DIO/AIO WRITE throttling and ext4 serialization
  2011-06-03  0:54               ` Vivek Goyal
  2011-06-03  1:02                 ` Christoph Hellwig
@ 2011-06-03  1:11                 ` Ted Ts'o
  1 sibling, 0 replies; 17+ messages in thread
From: Ted Ts'o @ 2011-06-03  1:11 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Dave Chinner, linux-ext4

On Thu, Jun 02, 2011 at 08:54:03PM -0400, Vivek Goyal wrote:
> 
> In this case I had done "sync" while aio-stress was doing O_DIRECT writes.
> I really don't have any real world example, I just cooked up a hypothetical
> scenario.
> 
> Just wondering why ext4 and XFS behavior are different and which is a
> more appropriate behavior. ext4 does not seem to be waiting for all
> pending AIO/DIO to finish while XFS does.

I think this is something we can chalk up to "different
implementations do different things".  I'm not convinced either
behviour is wrong per se.  Granted, the recent work to make sync and
fsync not livelock in the face of continuing writes means that I'm
more happy with ext4's behaviour, but I don't think that means xfs's
behavior is wrong.

One of the things that I have thought about is sysctl which makes sync
a no-op unless you are root.  The reason for that is that many system
administrators sometimes have a habit of typing sync, and on a heavily
loaded production server, this can really cause performance to go to
hell for up to tens of minutes.  So it might make sense to not allow
non-root users from trashing overall system performance by running
sync....

						- Ted

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Query about DIO/AIO WRITE throttling and ext4 serialization
  2011-06-03  1:02                 ` Christoph Hellwig
@ 2011-06-03  1:28                   ` Vivek Goyal
  2011-06-03  1:33                     ` Vivek Goyal
  2011-06-03  3:30                   ` Eric Sandeen
  1 sibling, 1 reply; 17+ messages in thread
From: Vivek Goyal @ 2011-06-03  1:28 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Ted Ts'o, Dave Chinner, linux-ext4

On Thu, Jun 02, 2011 at 09:02:33PM -0400, Christoph Hellwig wrote:
> On Thu, Jun 02, 2011 at 08:54:03PM -0400, Vivek Goyal wrote:
> > Just wondering why ext4 and XFS behavior are different and which is a
> > more appropriate behavior. ext4 does not seem to be waiting for all
> > pending AIO/DIO to finish while XFS does.
> 
> They're both wrong.  Ext4 completely misses support in fsync or sync
> to catch pending unwrittent extent conversions, and thus fails to obey
> the data integrity guarante.  XFS is beeing rather stupid about the
> amount of synchronization it requires.  The untested patch below
> should help with avoiding the synchronization if you're purely doing
> overwrites:

Yes this patch helps. I have already laid out the file and doing
overwrites.

I throttled aio-stress in one cgroup to 1 byte/sec and edited another
file from other cgroup and did a "sync" and it completed.

Thanks
Vivek

> 
> 
> Index: xfs/fs/xfs/linux-2.6/xfs_aops.c
> ===================================================================
> --- xfs.orig/fs/xfs/linux-2.6/xfs_aops.c	2011-06-03 09:54:52.964337556 +0900
> +++ xfs/fs/xfs/linux-2.6/xfs_aops.c	2011-06-03 09:57:06.877674259 +0900
> @@ -270,7 +270,7 @@ xfs_finish_ioend_sync(
>   * (vs. incore size).
>   */
>  STATIC xfs_ioend_t *
> -xfs_alloc_ioend(
> +__xfs_alloc_ioend(
>  	struct inode		*inode,
>  	unsigned int		type)
>  {
> @@ -290,7 +290,6 @@ xfs_alloc_ioend(
>  	ioend->io_inode = inode;
>  	ioend->io_buffer_head = NULL;
>  	ioend->io_buffer_tail = NULL;
> -	atomic_inc(&XFS_I(ioend->io_inode)->i_iocount);
>  	ioend->io_offset = 0;
>  	ioend->io_size = 0;
>  	ioend->io_iocb = NULL;
> @@ -300,6 +299,18 @@ xfs_alloc_ioend(
>  	return ioend;
>  }
>  
> +STATIC xfs_ioend_t *
> +xfs_alloc_ioend(
> +	struct inode		*inode,
> +	unsigned int		type)
> +{
> +	struct xfs_ioend	*ioend;
> +
> +	ioend = __xfs_alloc_ioend(inode, type);
> +	atomic_inc(&XFS_I(ioend->io_inode)->i_iocount);
> +	return ioend;
> +}
> +
>  STATIC int
>  xfs_map_blocks(
>  	struct inode		*inode,
> @@ -1318,6 +1329,7 @@ xfs_end_io_direct_write(
>  	 */
>  	iocb->private = NULL;
>  
> +	atomic_inc(&XFS_I(ioend->io_inode)->i_iocount);
>  	ioend->io_offset = offset;
>  	ioend->io_size = size;
>  	if (private && size > 0)
> @@ -1354,7 +1366,7 @@ xfs_vm_direct_IO(
>  	ssize_t			ret;
>  
>  	if (rw & WRITE) {
> -		iocb->private = xfs_alloc_ioend(inode, IO_DIRECT);
> +		iocb->private = __xfs_alloc_ioend(inode, IO_DIRECT);
>  
>  		ret = __blockdev_direct_IO(rw, iocb, inode, bdev, iov,
>  					    offset, nr_segs,

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Query about DIO/AIO WRITE throttling and ext4 serialization
  2011-06-03  1:28                   ` Vivek Goyal
@ 2011-06-03  1:33                     ` Vivek Goyal
  2011-06-09 13:09                       ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Vivek Goyal @ 2011-06-03  1:33 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Ted Ts'o, Dave Chinner, linux-ext4

On Thu, Jun 02, 2011 at 09:28:58PM -0400, Vivek Goyal wrote:
> On Thu, Jun 02, 2011 at 09:02:33PM -0400, Christoph Hellwig wrote:
> > On Thu, Jun 02, 2011 at 08:54:03PM -0400, Vivek Goyal wrote:
> > > Just wondering why ext4 and XFS behavior are different and which is a
> > > more appropriate behavior. ext4 does not seem to be waiting for all
> > > pending AIO/DIO to finish while XFS does.
> > 
> > They're both wrong.  Ext4 completely misses support in fsync or sync
> > to catch pending unwrittent extent conversions, and thus fails to obey
> > the data integrity guarante.  XFS is beeing rather stupid about the
> > amount of synchronization it requires.  The untested patch below
> > should help with avoiding the synchronization if you're purely doing
> > overwrites:
> 
> Yes this patch helps. I have already laid out the file and doing
> overwrites.
> 
> I throttled aio-stress in one cgroup to 1 byte/sec and edited another
> file from other cgroup and did a "sync" and it completed.

Even other test where I am running aio-stress in one window and edited
a file in another window and typed "sync" worked. "sync" does not hang
waiting for aio-stress to finish.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Query about DIO/AIO WRITE throttling and ext4 serialization
  2011-06-03  1:02                 ` Christoph Hellwig
  2011-06-03  1:28                   ` Vivek Goyal
@ 2011-06-03  3:30                   ` Eric Sandeen
  2011-06-03  5:00                     ` Christoph Hellwig
  1 sibling, 1 reply; 17+ messages in thread
From: Eric Sandeen @ 2011-06-03  3:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Vivek Goyal, Ted Ts'o, Dave Chinner, linux-ext4

On 6/2/11 8:02 PM, Christoph Hellwig wrote:
> On Thu, Jun 02, 2011 at 08:54:03PM -0400, Vivek Goyal wrote:
>> Just wondering why ext4 and XFS behavior are different and which is a
>> more appropriate behavior. ext4 does not seem to be waiting for all
>> pending AIO/DIO to finish while XFS does.
> 
> They're both wrong.  Ext4 completely misses support in fsync or sync
> to catch pending unwrittent extent conversions, and thus fails to obey
> the data integrity guarante.  

I'm not sure about that.

ext4_sync_file() does ext4_flush_completed_IO() which does:

 * When IO is completed, the work to convert unwritten extents to
 * written is queued on workqueue but may not get immediately
 * scheduled. When fsync is called, we need to ensure the
 * conversion is complete before fsync returns.
 * The inode keeps track of a list of pending/completed IO that
 * might needs to do the conversion. This function walks through
 * the list and convert the related unwritten extents for completed IO
 * to written.

Granted, I get easily lost in ext4's codepaths here, which is actually
why I suggested Vivek pose these questions to the list ;)

-Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Query about DIO/AIO WRITE throttling and ext4 serialization
  2011-06-03  3:30                   ` Eric Sandeen
@ 2011-06-03  5:00                     ` Christoph Hellwig
  0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2011-06-03  5:00 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Christoph Hellwig, Vivek Goyal, Ted Ts'o, Dave Chinner, linux-ext4

On Thu, Jun 02, 2011 at 10:30:38PM -0500, Eric Sandeen wrote:
> On 6/2/11 8:02 PM, Christoph Hellwig wrote:
> > On Thu, Jun 02, 2011 at 08:54:03PM -0400, Vivek Goyal wrote:
> >> Just wondering why ext4 and XFS behavior are different and which is a
> >> more appropriate behavior. ext4 does not seem to be waiting for all
> >> pending AIO/DIO to finish while XFS does.
> > 
> > They're both wrong.  Ext4 completely misses support in fsync or sync
> > to catch pending unwrittent extent conversions, and thus fails to obey
> > the data integrity guarante.  
> 
> I'm not sure about that.
> 
> ext4_sync_file() does ext4_flush_completed_IO() which does:

> Granted, I get easily lost in ext4's codepaths here, which is actually
> why I suggested Vivek pose these questions to the list ;)

You're right it gets fsync right, but the sync still seems to be missing,
which does not just include sync, but also the syncfs system call
and unmount.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Query about DIO/AIO WRITE throttling and ext4 serialization
  2011-06-03  1:33                     ` Vivek Goyal
@ 2011-06-09 13:09                       ` Christoph Hellwig
  0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2011-06-09 13:09 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Christoph Hellwig, Ted Ts'o, Dave Chinner, linux-ext4

On Thu, Jun 02, 2011 at 09:33:45PM -0400, Vivek Goyal wrote:
> > Yes this patch helps. I have already laid out the file and doing
> > overwrites.
> > 
> > I throttled aio-stress in one cgroup to 1 byte/sec and edited another
> > file from other cgroup and did a "sync" and it completed.
> 
> Even other test where I am running aio-stress in one window and edited
> a file in another window and typed "sync" worked. "sync" does not hang
> waiting for aio-stress to finish.

I've been thinking about the patch a bit more, and I think it's simply
incorrect.  i_iocount is the only thing that actually tracks in-flight
DIO/AIO requests, so we can't actually skip incrementing it as that
means we can't wait for pending AIO in fsync/sync/inode reclaim or
remount r/o.

We could simply declare AIO is off limits for sync and skip it there,
which is easily doable, but we'd still need a special case version of
sync for remount r/o as that absolutely needs to stop all pending I/O.

Of the other filesystem ext4 also has the counter, but only waits for
it during inode teardown, and using a slightly different, but also
effective scheme for fsync, but completely ignores sync and remount.

I couldn't find a similar scheme in other filesystem supporting AIO,
but it might be hidden a bit better.

I suspect we could optimize things by using the dual count and list
approach ext4 does - there is a counter for in-flight direct I/O, which
we only check for inode teardown and remount, as those need to stop
pending I/O, but sync and fsync can skip them as they only need to
flush pending I/O.  There is a list for the pending unwritten extent
conversions that only gets appended to when the actual I/O is done,
and the unwritten extent conversion is queued up. 

I'll see if I can come up with a good scheme for that, preferably
sitting directly in the direct I/O code, so that everyone gets it
without additional work.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-06-09 13:09 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-01 21:50 Query about DIO/AIO WRITE throttling and ext4 serialization Vivek Goyal
2011-06-02  1:22 ` Dave Chinner
2011-06-02 14:17   ` Vivek Goyal
2011-06-02 14:36     ` Vivek Goyal
2011-06-02 15:56       ` Vivek Goyal
2011-06-02 23:51         ` Dave Chinner
2011-06-03  0:27           ` Vivek Goyal
2011-06-03  0:43             ` Ted Ts'o
2011-06-03  0:54               ` Vivek Goyal
2011-06-03  1:02                 ` Christoph Hellwig
2011-06-03  1:28                   ` Vivek Goyal
2011-06-03  1:33                     ` Vivek Goyal
2011-06-09 13:09                       ` Christoph Hellwig
2011-06-03  3:30                   ` Eric Sandeen
2011-06-03  5:00                     ` Christoph Hellwig
2011-06-03  1:11                 ` Ted Ts'o
2011-06-02 23:46     ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.