All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Fix aio completion vs unwritten extents
@ 2010-06-22 12:21 ` Christoph Hellwig
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2010-06-22 12:21 UTC (permalink / raw)
  To: linux-fsdevel, xfs, linux-ext4

Some filesystems (XFS and ext4) have support for a concept called
unwritten extents, where we can write data into holes / preallocated
space and only mark them as allocated when the data I/O has finished.

Because the transaction to convert the extent can't be submitted from
I/O completion, which normally happens from IRQ context it needs to
be defered to a workqueue.  This is not a problem for buffered I/O
where we keep the data in cache at least until the I/O operation has
finished, but it is an issue for direct I/O.  XFS avoids that problem
for synchronous direct I/O by waiting for all unwritten extent conversions
to finish if we did one during direct I/O, but so far has ignored the
problem for asynchronous I/O.  Unfortunately the race is very easy
to hit by using QEMU with native AIO support on a sparse image, and
the result is filesystem corruption in the guest.

This contains core direct I/O changes to allow the filesystem to delay
AIO completion, as well as a patch to fix XFS.  ext4 also has the same
issue, and from a quick look also doesn't properly complete unwritten
extent conversions for synchronous direct I/O, but I'll leave that
for someone more familar to figure out.

Below is a minimal reproducer for the issue.  Given that we're dealing
with a race condition it doesn't always fail, but in 2 core laptop
it triggers 100% reproducibly in 20 runs in a loop.

---

#define _GNU_SOURCE

#include <sys/stat.h>
#include <sys/types.h>
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#include <libaio.h>

#define BUF_SIZE	4096
#define IO_PATTERN	0xab

int main(int argc, char *argv[])
{
	struct io_context *ctx = NULL;
	struct io_event ev;
	struct iocb iocb, *iocbs[] = { &iocb };
	void *buf;
	char cmp_buf[BUF_SIZE];
	int fd, err = 0;

	fd = open(argv[1], O_DIRECT | O_CREAT | O_TRUNC | O_RDWR, 0600);
	if (fd == -1) {
		perror("open");
		return 1;
	}

	err = posix_memalign(&buf, BUF_SIZE, BUF_SIZE);
	if (err) {
		fprintf(stderr, "error %s during %s\n",
			strerror(-err),
			"posix_memalign");
		return 1;
	}
	memset(buf, IO_PATTERN, BUF_SIZE);
	memset(cmp_buf, IO_PATTERN, BUF_SIZE);

	/*
	 * Truncate to some random large file size.  Just make sure
	 * it's not smaller than our I/O size.
	 */
	if (ftruncate(fd, 1024 * 1024 * 1024) < 0) {
		perror("ftruncate");
		return 1;
	}


	/*
	 * Do a simple 4k write into a hole using aio.
	 */
	err = io_setup(1, &ctx);
	if (err) {
		fprintf(stderr, "error %s during %s\n",
			strerror(-err),
			"io_setup");
		return 1;
	}

	io_prep_pwrite(&iocb, fd, buf, BUF_SIZE, 0);

	err = io_submit(ctx, 1, iocbs);
	if (err != 1) {
		fprintf(stderr, "error %s during %s\n",
			strerror(-err),
			"io_submit");
		return 1;
	}

	err = io_getevents(ctx, 1, 1, &ev, NULL);
	if (err != 1) {
		fprintf(stderr, "error %s during %s\n",
			strerror(-err),
			"io_getevents");
		return 1;
	}

	/*
	 * And then read it back.
	 *
	 * Using pread to keep it simple, but AIO has the same effect.
	 */
	if (pread(fd, buf, BUF_SIZE, 0) != BUF_SIZE) {
		perror("pread");
		return 1;
	}

	/*
	 * And depending on the machine we'll just get zeroes back quite
	 * often here.  That's because the unwritten extent conversion
	 * hasn't finished.
	 */
	if (memcmp(buf, cmp_buf, BUF_SIZE)) {
		unsigned long long *ubuf = (unsigned long long *)buf;
		int i;

		for (i = 0; i < BUF_SIZE / sizeof(unsigned long long); i++)
			printf("%d: 0x%llx\n", i, ubuf[i]);
			
		return 1;
	}

	return 0;
}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 0/2] Fix aio completion vs unwritten extents
@ 2010-06-22 12:21 ` Christoph Hellwig
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2010-06-22 12:21 UTC (permalink / raw)
  To: linux-fsdevel, xfs, linux-ext4

Some filesystems (XFS and ext4) have support for a concept called
unwritten extents, where we can write data into holes / preallocated
space and only mark them as allocated when the data I/O has finished.

Because the transaction to convert the extent can't be submitted from
I/O completion, which normally happens from IRQ context it needs to
be defered to a workqueue.  This is not a problem for buffered I/O
where we keep the data in cache at least until the I/O operation has
finished, but it is an issue for direct I/O.  XFS avoids that problem
for synchronous direct I/O by waiting for all unwritten extent conversions
to finish if we did one during direct I/O, but so far has ignored the
problem for asynchronous I/O.  Unfortunately the race is very easy
to hit by using QEMU with native AIO support on a sparse image, and
the result is filesystem corruption in the guest.

This contains core direct I/O changes to allow the filesystem to delay
AIO completion, as well as a patch to fix XFS.  ext4 also has the same
issue, and from a quick look also doesn't properly complete unwritten
extent conversions for synchronous direct I/O, but I'll leave that
for someone more familar to figure out.

Below is a minimal reproducer for the issue.  Given that we're dealing
with a race condition it doesn't always fail, but in 2 core laptop
it triggers 100% reproducibly in 20 runs in a loop.

---

#define _GNU_SOURCE

#include <sys/stat.h>
#include <sys/types.h>
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#include <libaio.h>

#define BUF_SIZE	4096
#define IO_PATTERN	0xab

int main(int argc, char *argv[])
{
	struct io_context *ctx = NULL;
	struct io_event ev;
	struct iocb iocb, *iocbs[] = { &iocb };
	void *buf;
	char cmp_buf[BUF_SIZE];
	int fd, err = 0;

	fd = open(argv[1], O_DIRECT | O_CREAT | O_TRUNC | O_RDWR, 0600);
	if (fd == -1) {
		perror("open");
		return 1;
	}

	err = posix_memalign(&buf, BUF_SIZE, BUF_SIZE);
	if (err) {
		fprintf(stderr, "error %s during %s\n",
			strerror(-err),
			"posix_memalign");
		return 1;
	}
	memset(buf, IO_PATTERN, BUF_SIZE);
	memset(cmp_buf, IO_PATTERN, BUF_SIZE);

	/*
	 * Truncate to some random large file size.  Just make sure
	 * it's not smaller than our I/O size.
	 */
	if (ftruncate(fd, 1024 * 1024 * 1024) < 0) {
		perror("ftruncate");
		return 1;
	}


	/*
	 * Do a simple 4k write into a hole using aio.
	 */
	err = io_setup(1, &ctx);
	if (err) {
		fprintf(stderr, "error %s during %s\n",
			strerror(-err),
			"io_setup");
		return 1;
	}

	io_prep_pwrite(&iocb, fd, buf, BUF_SIZE, 0);

	err = io_submit(ctx, 1, iocbs);
	if (err != 1) {
		fprintf(stderr, "error %s during %s\n",
			strerror(-err),
			"io_submit");
		return 1;
	}

	err = io_getevents(ctx, 1, 1, &ev, NULL);
	if (err != 1) {
		fprintf(stderr, "error %s during %s\n",
			strerror(-err),
			"io_getevents");
		return 1;
	}

	/*
	 * And then read it back.
	 *
	 * Using pread to keep it simple, but AIO has the same effect.
	 */
	if (pread(fd, buf, BUF_SIZE, 0) != BUF_SIZE) {
		perror("pread");
		return 1;
	}

	/*
	 * And depending on the machine we'll just get zeroes back quite
	 * often here.  That's because the unwritten extent conversion
	 * hasn't finished.
	 */
	if (memcmp(buf, cmp_buf, BUF_SIZE)) {
		unsigned long long *ubuf = (unsigned long long *)buf;
		int i;

		for (i = 0; i < BUF_SIZE / sizeof(unsigned long long); i++)
			printf("%d: 0x%llx\n", i, ubuf[i]);
			
		return 1;
	}

	return 0;
}

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] direct-io: move aio_complete into ->end_io
  2010-06-22 12:21 ` Christoph Hellwig
@ 2010-06-22 12:21   ` Christoph Hellwig
  -1 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2010-06-22 12:21 UTC (permalink / raw)
  To: linux-fsdevel, xfs, linux-ext4

[-- Attachment #1: direct-io-unwritten-fix --]
[-- Type: text/plain, Size: 6713 bytes --]

Filesystems with unwritten extent support must not complete an AIO request
until the transaction to convert the extent has been commited.  That means
the aio_complete calls needs to be moved into the ->end_io callback so
that the filesystem can control when to call it exactly.

This makes a bit of a mess out of dio_complete and the ->end_io callback
prototype even more complicated.  In addition ->end_io is now called with
i_alloc_sem held for DIO_LOCKING filesystems.  The only filesystem that
has both and ->end_io callback and sets DIO_LOCKING is ext4, which doesn't
appear to do anything that could deadlock with i_alloc_sem in ->end_io.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/fs/direct-io.c
===================================================================
--- linux-2.6.orig/fs/direct-io.c	2010-06-22 09:48:37.239004298 +0200
+++ linux-2.6/fs/direct-io.c	2010-06-22 11:54:42.281003878 +0200
@@ -218,7 +218,7 @@ static struct page *dio_get_page(struct
  * filesystems can use it to hold additional state between get_block calls and
  * dio_complete.
  */
-static int dio_complete(struct dio *dio, loff_t offset, int ret)
+static int dio_complete(struct dio *dio, loff_t offset, int ret, bool is_async)
 {
 	ssize_t transferred = 0;
 
@@ -239,14 +239,6 @@ static int dio_complete(struct dio *dio,
 			transferred = dio->i_size - offset;
 	}
 
-	if (dio->end_io && dio->result)
-		dio->end_io(dio->iocb, offset, transferred,
-			    dio->map_bh.b_private);
-
-	if (dio->flags & DIO_LOCKING)
-		/* lockdep: non-owner release */
-		up_read_non_owner(&dio->inode->i_alloc_sem);
-
 	if (ret == 0)
 		ret = dio->page_errors;
 	if (ret == 0)
@@ -254,6 +246,17 @@ static int dio_complete(struct dio *dio,
 	if (ret == 0)
 		ret = transferred;
 
+	if (dio->end_io && dio->result) {
+		dio->end_io(dio->iocb, offset, transferred,
+			    dio->map_bh.b_private, ret, is_async);
+	} else if (is_async) {
+		aio_complete(dio->iocb, ret, 0);
+	}
+
+	if (dio->flags & DIO_LOCKING)
+		/* lockdep: non-owner release */
+		up_read_non_owner(&dio->inode->i_alloc_sem);
+
 	return ret;
 }
 
@@ -277,8 +280,7 @@ static void dio_bio_end_aio(struct bio *
 	spin_unlock_irqrestore(&dio->bio_lock, flags);
 
 	if (remaining == 0) {
-		int ret = dio_complete(dio, dio->iocb->ki_pos, 0);
-		aio_complete(dio->iocb, ret, 0);
+		dio_complete(dio, dio->iocb->ki_pos, 0, true);
 		kfree(dio);
 	}
 }
@@ -1126,7 +1128,7 @@ direct_io_worker(int rw, struct kiocb *i
 	spin_unlock_irqrestore(&dio->bio_lock, flags);
 
 	if (ret2 == 0) {
-		ret = dio_complete(dio, offset, ret);
+		ret = dio_complete(dio, offset, ret, false);
 		kfree(dio);
 	} else
 		BUG_ON(ret != -EIOCBQUEUED);
Index: linux-2.6/fs/ext4/inode.c
===================================================================
--- linux-2.6.orig/fs/ext4/inode.c	2010-06-22 09:48:37.249004508 +0200
+++ linux-2.6/fs/ext4/inode.c	2010-06-22 12:18:45.883255381 +0200
@@ -3775,7 +3775,8 @@ static ext4_io_end_t *ext4_init_io_end (
 }
 
 static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
-			    ssize_t size, void *private)
+			    ssize_t size, void *private, int ret,
+			    bool is_async)
 {
         ext4_io_end_t *io_end = iocb->private;
 	struct workqueue_struct *wq;
@@ -3784,7 +3785,7 @@ static void ext4_end_io_dio(struct kiocb
 
 	/* if not async direct IO or dio with 0 bytes write, just return */
 	if (!io_end || !size)
-		return;
+		goto out;
 
 	ext_debug("ext4_end_io_dio(): io_end 0x%p"
 		  "for inode %lu, iocb 0x%p, offset %llu, size %llu\n",
@@ -3795,7 +3796,7 @@ static void ext4_end_io_dio(struct kiocb
 	if (io_end->flag != EXT4_IO_UNWRITTEN){
 		ext4_free_io_end(io_end);
 		iocb->private = NULL;
-		return;
+		goto out;
 	}
 
 	io_end->offset = offset;
@@ -3812,6 +3813,9 @@ static void ext4_end_io_dio(struct kiocb
 	list_add_tail(&io_end->list, &ei->i_completed_io_list);
 	spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
 	iocb->private = NULL;
+out:
+	if (is_async)
+		aio_complete(iocb, ret, 0);
 }
 
 static void ext4_end_io_buffer_write(struct buffer_head *bh, int uptodate)
Index: linux-2.6/fs/ocfs2/aops.c
===================================================================
--- linux-2.6.orig/fs/ocfs2/aops.c	2010-06-22 09:48:37.259012749 +0200
+++ linux-2.6/fs/ocfs2/aops.c	2010-06-22 12:19:03.931005757 +0200
@@ -609,7 +609,9 @@ bail:
 static void ocfs2_dio_end_io(struct kiocb *iocb,
 			     loff_t offset,
 			     ssize_t bytes,
-			     void *private)
+			     void *private,
+			     int ret,
+			     bool is_async)
 {
 	struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode;
 	int level;
@@ -623,6 +625,9 @@ static void ocfs2_dio_end_io(struct kioc
 	if (!level)
 		up_read(&inode->i_alloc_sem);
 	ocfs2_rw_unlock(inode, level);
+
+	if (is_async)
+		aio_complete(iocb, ret, 0);
 }
 
 /*
Index: linux-2.6/fs/xfs/linux-2.6/xfs_aops.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_aops.c	2010-06-22 09:48:37.268012190 +0200
+++ linux-2.6/fs/xfs/linux-2.6/xfs_aops.c	2010-06-22 14:05:02.522005722 +0200
@@ -1599,7 +1599,9 @@ xfs_end_io_direct(
 	struct kiocb	*iocb,
 	loff_t		offset,
 	ssize_t		size,
-	void		*private)
+	void		*private,
+	int		ret,
+	bool		is_async)
 {
 	xfs_ioend_t	*ioend = iocb->private;
 
@@ -1645,6 +1647,9 @@ xfs_end_io_direct(
 	 * against double-freeing.
 	 */
 	iocb->private = NULL;
+
+	if (is_async)
+		aio_complete(iocb, ret, 0);
 }
 
 STATIC ssize_t
Index: linux-2.6/fs/xfs/linux-2.6/xfs_aops.h
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_aops.h	2010-06-22 09:48:37.278274238 +0200
+++ linux-2.6/fs/xfs/linux-2.6/xfs_aops.h	2010-06-22 09:49:12.388034051 +0200
@@ -37,6 +37,8 @@ typedef struct xfs_ioend {
 	size_t			io_size;	/* size of the extent */
 	xfs_off_t		io_offset;	/* offset in the file */
 	struct work_struct	io_work;	/* xfsdatad work queue */
+	struct kiocb		*io_iocb;
+	int			io_result;
 } xfs_ioend_t;
 
 extern const struct address_space_operations xfs_address_space_operations;
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2010-06-22 09:49:07.188253984 +0200
+++ linux-2.6/include/linux/fs.h	2010-06-22 10:34:10.128005975 +0200
@@ -415,7 +415,8 @@ struct buffer_head;
 typedef int (get_block_t)(struct inode *inode, sector_t iblock,
 			struct buffer_head *bh_result, int create);
 typedef void (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
-			ssize_t bytes, void *private);
+			ssize_t bytes, void *private, int ret,
+			bool is_async);
 
 /*
  * Attribute flags.  These should be or-ed together to figure out what


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] direct-io: move aio_complete into ->end_io
@ 2010-06-22 12:21   ` Christoph Hellwig
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2010-06-22 12:21 UTC (permalink / raw)
  To: linux-fsdevel, xfs, linux-ext4

[-- Attachment #1: direct-io-unwritten-fix --]
[-- Type: text/plain, Size: 6834 bytes --]

Filesystems with unwritten extent support must not complete an AIO request
until the transaction to convert the extent has been commited.  That means
the aio_complete calls needs to be moved into the ->end_io callback so
that the filesystem can control when to call it exactly.

This makes a bit of a mess out of dio_complete and the ->end_io callback
prototype even more complicated.  In addition ->end_io is now called with
i_alloc_sem held for DIO_LOCKING filesystems.  The only filesystem that
has both and ->end_io callback and sets DIO_LOCKING is ext4, which doesn't
appear to do anything that could deadlock with i_alloc_sem in ->end_io.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/fs/direct-io.c
===================================================================
--- linux-2.6.orig/fs/direct-io.c	2010-06-22 09:48:37.239004298 +0200
+++ linux-2.6/fs/direct-io.c	2010-06-22 11:54:42.281003878 +0200
@@ -218,7 +218,7 @@ static struct page *dio_get_page(struct
  * filesystems can use it to hold additional state between get_block calls and
  * dio_complete.
  */
-static int dio_complete(struct dio *dio, loff_t offset, int ret)
+static int dio_complete(struct dio *dio, loff_t offset, int ret, bool is_async)
 {
 	ssize_t transferred = 0;
 
@@ -239,14 +239,6 @@ static int dio_complete(struct dio *dio,
 			transferred = dio->i_size - offset;
 	}
 
-	if (dio->end_io && dio->result)
-		dio->end_io(dio->iocb, offset, transferred,
-			    dio->map_bh.b_private);
-
-	if (dio->flags & DIO_LOCKING)
-		/* lockdep: non-owner release */
-		up_read_non_owner(&dio->inode->i_alloc_sem);
-
 	if (ret == 0)
 		ret = dio->page_errors;
 	if (ret == 0)
@@ -254,6 +246,17 @@ static int dio_complete(struct dio *dio,
 	if (ret == 0)
 		ret = transferred;
 
+	if (dio->end_io && dio->result) {
+		dio->end_io(dio->iocb, offset, transferred,
+			    dio->map_bh.b_private, ret, is_async);
+	} else if (is_async) {
+		aio_complete(dio->iocb, ret, 0);
+	}
+
+	if (dio->flags & DIO_LOCKING)
+		/* lockdep: non-owner release */
+		up_read_non_owner(&dio->inode->i_alloc_sem);
+
 	return ret;
 }
 
@@ -277,8 +280,7 @@ static void dio_bio_end_aio(struct bio *
 	spin_unlock_irqrestore(&dio->bio_lock, flags);
 
 	if (remaining == 0) {
-		int ret = dio_complete(dio, dio->iocb->ki_pos, 0);
-		aio_complete(dio->iocb, ret, 0);
+		dio_complete(dio, dio->iocb->ki_pos, 0, true);
 		kfree(dio);
 	}
 }
@@ -1126,7 +1128,7 @@ direct_io_worker(int rw, struct kiocb *i
 	spin_unlock_irqrestore(&dio->bio_lock, flags);
 
 	if (ret2 == 0) {
-		ret = dio_complete(dio, offset, ret);
+		ret = dio_complete(dio, offset, ret, false);
 		kfree(dio);
 	} else
 		BUG_ON(ret != -EIOCBQUEUED);
Index: linux-2.6/fs/ext4/inode.c
===================================================================
--- linux-2.6.orig/fs/ext4/inode.c	2010-06-22 09:48:37.249004508 +0200
+++ linux-2.6/fs/ext4/inode.c	2010-06-22 12:18:45.883255381 +0200
@@ -3775,7 +3775,8 @@ static ext4_io_end_t *ext4_init_io_end (
 }
 
 static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
-			    ssize_t size, void *private)
+			    ssize_t size, void *private, int ret,
+			    bool is_async)
 {
         ext4_io_end_t *io_end = iocb->private;
 	struct workqueue_struct *wq;
@@ -3784,7 +3785,7 @@ static void ext4_end_io_dio(struct kiocb
 
 	/* if not async direct IO or dio with 0 bytes write, just return */
 	if (!io_end || !size)
-		return;
+		goto out;
 
 	ext_debug("ext4_end_io_dio(): io_end 0x%p"
 		  "for inode %lu, iocb 0x%p, offset %llu, size %llu\n",
@@ -3795,7 +3796,7 @@ static void ext4_end_io_dio(struct kiocb
 	if (io_end->flag != EXT4_IO_UNWRITTEN){
 		ext4_free_io_end(io_end);
 		iocb->private = NULL;
-		return;
+		goto out;
 	}
 
 	io_end->offset = offset;
@@ -3812,6 +3813,9 @@ static void ext4_end_io_dio(struct kiocb
 	list_add_tail(&io_end->list, &ei->i_completed_io_list);
 	spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
 	iocb->private = NULL;
+out:
+	if (is_async)
+		aio_complete(iocb, ret, 0);
 }
 
 static void ext4_end_io_buffer_write(struct buffer_head *bh, int uptodate)
Index: linux-2.6/fs/ocfs2/aops.c
===================================================================
--- linux-2.6.orig/fs/ocfs2/aops.c	2010-06-22 09:48:37.259012749 +0200
+++ linux-2.6/fs/ocfs2/aops.c	2010-06-22 12:19:03.931005757 +0200
@@ -609,7 +609,9 @@ bail:
 static void ocfs2_dio_end_io(struct kiocb *iocb,
 			     loff_t offset,
 			     ssize_t bytes,
-			     void *private)
+			     void *private,
+			     int ret,
+			     bool is_async)
 {
 	struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode;
 	int level;
@@ -623,6 +625,9 @@ static void ocfs2_dio_end_io(struct kioc
 	if (!level)
 		up_read(&inode->i_alloc_sem);
 	ocfs2_rw_unlock(inode, level);
+
+	if (is_async)
+		aio_complete(iocb, ret, 0);
 }
 
 /*
Index: linux-2.6/fs/xfs/linux-2.6/xfs_aops.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_aops.c	2010-06-22 09:48:37.268012190 +0200
+++ linux-2.6/fs/xfs/linux-2.6/xfs_aops.c	2010-06-22 14:05:02.522005722 +0200
@@ -1599,7 +1599,9 @@ xfs_end_io_direct(
 	struct kiocb	*iocb,
 	loff_t		offset,
 	ssize_t		size,
-	void		*private)
+	void		*private,
+	int		ret,
+	bool		is_async)
 {
 	xfs_ioend_t	*ioend = iocb->private;
 
@@ -1645,6 +1647,9 @@ xfs_end_io_direct(
 	 * against double-freeing.
 	 */
 	iocb->private = NULL;
+
+	if (is_async)
+		aio_complete(iocb, ret, 0);
 }
 
 STATIC ssize_t
Index: linux-2.6/fs/xfs/linux-2.6/xfs_aops.h
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_aops.h	2010-06-22 09:48:37.278274238 +0200
+++ linux-2.6/fs/xfs/linux-2.6/xfs_aops.h	2010-06-22 09:49:12.388034051 +0200
@@ -37,6 +37,8 @@ typedef struct xfs_ioend {
 	size_t			io_size;	/* size of the extent */
 	xfs_off_t		io_offset;	/* offset in the file */
 	struct work_struct	io_work;	/* xfsdatad work queue */
+	struct kiocb		*io_iocb;
+	int			io_result;
 } xfs_ioend_t;
 
 extern const struct address_space_operations xfs_address_space_operations;
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2010-06-22 09:49:07.188253984 +0200
+++ linux-2.6/include/linux/fs.h	2010-06-22 10:34:10.128005975 +0200
@@ -415,7 +415,8 @@ struct buffer_head;
 typedef int (get_block_t)(struct inode *inode, sector_t iblock,
 			struct buffer_head *bh_result, int create);
 typedef void (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
-			ssize_t bytes, void *private);
+			ssize_t bytes, void *private, int ret,
+			bool is_async);
 
 /*
  * Attribute flags.  These should be or-ed together to figure out what

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2/2] xfs: move aio completion after unwritten extent conversion
  2010-06-22 12:21 ` Christoph Hellwig
@ 2010-06-22 12:21   ` Christoph Hellwig
  -1 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2010-06-22 12:21 UTC (permalink / raw)
  To: linux-fsdevel, xfs, linux-ext4

[-- Attachment #1: xfs-unwritten-fix --]
[-- Type: text/plain, Size: 2288 bytes --]

If we write into an unwritten extent using AIO we need to complete the AIO
request after the extent conversion has finished.  Without that a read could
race to see see the extent still unwritten and return zeros.   For synchronous
I/O we already take care of that by flushing the xfsconvertd workqueue (which
might be a bit of overkill).

To do that add iocb and result fields to struct xfs_ioend, so that we can
call aio_complete from xfs_end_io after the extent conversion has happened.
Note that we need a new result field as io_error is used for positive errno
values, while the AIO code can return negative error values and positive
transfer sizes.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/fs/xfs/linux-2.6/xfs_aops.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_aops.c	2010-06-22 14:05:02.522005722 +0200
+++ linux-2.6/fs/xfs/linux-2.6/xfs_aops.c	2010-06-22 14:08:30.065006205 +0200
@@ -275,8 +275,11 @@ xfs_end_io(
 		xfs_finish_ioend(ioend, 0);
 		/* ensure we don't spin on blocked ioends */
 		delay(1);
-	} else
+	} else {
+		if (ioend->io_iocb)
+			aio_complete(ioend->io_iocb, ioend->io_result, 0);
 		xfs_destroy_ioend(ioend);
+	}
 }
 
 /*
@@ -309,6 +312,8 @@ xfs_alloc_ioend(
 	atomic_inc(&XFS_I(ioend->io_inode)->i_iocount);
 	ioend->io_offset = 0;
 	ioend->io_size = 0;
+	ioend->io_iocb = NULL;
+	ioend->io_result = 0;
 
 	INIT_WORK(&ioend->io_work, xfs_end_io);
 	return ioend;
@@ -1604,6 +1609,7 @@ xfs_end_io_direct(
 	bool		is_async)
 {
 	xfs_ioend_t	*ioend = iocb->private;
+	bool		complete_aio = is_async;
 
 	/*
 	 * Non-NULL private data means we need to issue a transaction to
@@ -1629,7 +1635,14 @@ xfs_end_io_direct(
 	if (ioend->io_type == IO_READ) {
 		xfs_finish_ioend(ioend, 0);
 	} else if (private && size > 0) {
-		xfs_finish_ioend(ioend, is_sync_kiocb(iocb));
+		if (is_async) {
+			ioend->io_iocb = iocb;
+			ioend->io_result = ret;
+			complete_aio = false;
+			xfs_finish_ioend(ioend, 0);
+		} else {
+			xfs_finish_ioend(ioend, 1);
+		}
 	} else {
 		/*
 		 * A direct I/O write ioend starts it's life in unwritten
@@ -1648,7 +1661,7 @@ xfs_end_io_direct(
 	 */
 	iocb->private = NULL;
 
-	if (is_async)
+	if (complete_aio)
 		aio_complete(iocb, ret, 0);
 }
 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2/2] xfs: move aio completion after unwritten extent conversion
@ 2010-06-22 12:21   ` Christoph Hellwig
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2010-06-22 12:21 UTC (permalink / raw)
  To: linux-fsdevel, xfs, linux-ext4

[-- Attachment #1: xfs-unwritten-fix --]
[-- Type: text/plain, Size: 2409 bytes --]

If we write into an unwritten extent using AIO we need to complete the AIO
request after the extent conversion has finished.  Without that a read could
race to see see the extent still unwritten and return zeros.   For synchronous
I/O we already take care of that by flushing the xfsconvertd workqueue (which
might be a bit of overkill).

To do that add iocb and result fields to struct xfs_ioend, so that we can
call aio_complete from xfs_end_io after the extent conversion has happened.
Note that we need a new result field as io_error is used for positive errno
values, while the AIO code can return negative error values and positive
transfer sizes.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/fs/xfs/linux-2.6/xfs_aops.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_aops.c	2010-06-22 14:05:02.522005722 +0200
+++ linux-2.6/fs/xfs/linux-2.6/xfs_aops.c	2010-06-22 14:08:30.065006205 +0200
@@ -275,8 +275,11 @@ xfs_end_io(
 		xfs_finish_ioend(ioend, 0);
 		/* ensure we don't spin on blocked ioends */
 		delay(1);
-	} else
+	} else {
+		if (ioend->io_iocb)
+			aio_complete(ioend->io_iocb, ioend->io_result, 0);
 		xfs_destroy_ioend(ioend);
+	}
 }
 
 /*
@@ -309,6 +312,8 @@ xfs_alloc_ioend(
 	atomic_inc(&XFS_I(ioend->io_inode)->i_iocount);
 	ioend->io_offset = 0;
 	ioend->io_size = 0;
+	ioend->io_iocb = NULL;
+	ioend->io_result = 0;
 
 	INIT_WORK(&ioend->io_work, xfs_end_io);
 	return ioend;
@@ -1604,6 +1609,7 @@ xfs_end_io_direct(
 	bool		is_async)
 {
 	xfs_ioend_t	*ioend = iocb->private;
+	bool		complete_aio = is_async;
 
 	/*
 	 * Non-NULL private data means we need to issue a transaction to
@@ -1629,7 +1635,14 @@ xfs_end_io_direct(
 	if (ioend->io_type == IO_READ) {
 		xfs_finish_ioend(ioend, 0);
 	} else if (private && size > 0) {
-		xfs_finish_ioend(ioend, is_sync_kiocb(iocb));
+		if (is_async) {
+			ioend->io_iocb = iocb;
+			ioend->io_result = ret;
+			complete_aio = false;
+			xfs_finish_ioend(ioend, 0);
+		} else {
+			xfs_finish_ioend(ioend, 1);
+		}
 	} else {
 		/*
 		 * A direct I/O write ioend starts it's life in unwritten
@@ -1648,7 +1661,7 @@ xfs_end_io_direct(
 	 */
 	iocb->private = NULL;
 
-	if (is_async)
+	if (complete_aio)
 		aio_complete(iocb, ret, 0);
 }
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] direct-io: move aio_complete into ->end_io
  2010-06-22 12:21   ` Christoph Hellwig
@ 2010-06-24 21:59     ` Jan Kara
  -1 siblings, 0 replies; 18+ messages in thread
From: Jan Kara @ 2010-06-24 21:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, xfs, linux-ext4

On Tue 22-06-10 08:21:45, Christoph Hellwig wrote:
> Filesystems with unwritten extent support must not complete an AIO request
> until the transaction to convert the extent has been commited.  That means
> the aio_complete calls needs to be moved into the ->end_io callback so
> that the filesystem can control when to call it exactly.
> 
> This makes a bit of a mess out of dio_complete and the ->end_io callback
> prototype even more complicated.  In addition ->end_io is now called with
> i_alloc_sem held for DIO_LOCKING filesystems.  The only filesystem that
> has both and ->end_io callback and sets DIO_LOCKING is ext4, which doesn't
> appear to do anything that could deadlock with i_alloc_sem in ->end_io.
  Umm, I don't get this. Looking at the ->end_io callback it has been
always called with i_alloc_sem held. It's only aio_complete() which will
be called with i_alloc_sem held after your changes. Or am I missing
something?
  Moreover the async testing you do does not seem to be completely right.
dio->is_async is a flag that controls whether dio code waits for IO to be
completed or not. In particular it is not set for AIO that spans beyond
current i_size so it does not seem to be exactly what you need (at least
for ext4 it isn't). I think that is_sync_kiocb() is a test that should be
used to recognize AIO - and that has an advantage that you don't have to
pass the is_async flag around.

								Honza
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Index: linux-2.6/fs/direct-io.c
> ===================================================================
> --- linux-2.6.orig/fs/direct-io.c	2010-06-22 09:48:37.239004298 +0200
> +++ linux-2.6/fs/direct-io.c	2010-06-22 11:54:42.281003878 +0200
> @@ -218,7 +218,7 @@ static struct page *dio_get_page(struct
>   * filesystems can use it to hold additional state between get_block calls and
>   * dio_complete.
>   */
> -static int dio_complete(struct dio *dio, loff_t offset, int ret)
> +static int dio_complete(struct dio *dio, loff_t offset, int ret, bool is_async)
>  {
>  	ssize_t transferred = 0;
>  
> @@ -239,14 +239,6 @@ static int dio_complete(struct dio *dio,
>  			transferred = dio->i_size - offset;
>  	}
>  
> -	if (dio->end_io && dio->result)
> -		dio->end_io(dio->iocb, offset, transferred,
> -			    dio->map_bh.b_private);
> -
> -	if (dio->flags & DIO_LOCKING)
> -		/* lockdep: non-owner release */
> -		up_read_non_owner(&dio->inode->i_alloc_sem);
> -
>  	if (ret == 0)
>  		ret = dio->page_errors;
>  	if (ret == 0)
> @@ -254,6 +246,17 @@ static int dio_complete(struct dio *dio,
>  	if (ret == 0)
>  		ret = transferred;
>  
> +	if (dio->end_io && dio->result) {
> +		dio->end_io(dio->iocb, offset, transferred,
> +			    dio->map_bh.b_private, ret, is_async);
> +	} else if (is_async) {
> +		aio_complete(dio->iocb, ret, 0);
> +	}
> +
> +	if (dio->flags & DIO_LOCKING)
> +		/* lockdep: non-owner release */
> +		up_read_non_owner(&dio->inode->i_alloc_sem);
> +
>  	return ret;
>  }
>  
> @@ -277,8 +280,7 @@ static void dio_bio_end_aio(struct bio *
>  	spin_unlock_irqrestore(&dio->bio_lock, flags);
>  
>  	if (remaining == 0) {
> -		int ret = dio_complete(dio, dio->iocb->ki_pos, 0);
> -		aio_complete(dio->iocb, ret, 0);
> +		dio_complete(dio, dio->iocb->ki_pos, 0, true);
>  		kfree(dio);
>  	}
>  }
> @@ -1126,7 +1128,7 @@ direct_io_worker(int rw, struct kiocb *i
>  	spin_unlock_irqrestore(&dio->bio_lock, flags);
>  
>  	if (ret2 == 0) {
> -		ret = dio_complete(dio, offset, ret);
> +		ret = dio_complete(dio, offset, ret, false);
>  		kfree(dio);
>  	} else
>  		BUG_ON(ret != -EIOCBQUEUED);
> Index: linux-2.6/fs/ext4/inode.c
> ===================================================================
> --- linux-2.6.orig/fs/ext4/inode.c	2010-06-22 09:48:37.249004508 +0200
> +++ linux-2.6/fs/ext4/inode.c	2010-06-22 12:18:45.883255381 +0200
> @@ -3775,7 +3775,8 @@ static ext4_io_end_t *ext4_init_io_end (
>  }
>  
>  static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
> -			    ssize_t size, void *private)
> +			    ssize_t size, void *private, int ret,
> +			    bool is_async)
>  {
>          ext4_io_end_t *io_end = iocb->private;
>  	struct workqueue_struct *wq;
> @@ -3784,7 +3785,7 @@ static void ext4_end_io_dio(struct kiocb
>  
>  	/* if not async direct IO or dio with 0 bytes write, just return */
>  	if (!io_end || !size)
> -		return;
> +		goto out;
>  
>  	ext_debug("ext4_end_io_dio(): io_end 0x%p"
>  		  "for inode %lu, iocb 0x%p, offset %llu, size %llu\n",
> @@ -3795,7 +3796,7 @@ static void ext4_end_io_dio(struct kiocb
>  	if (io_end->flag != EXT4_IO_UNWRITTEN){
>  		ext4_free_io_end(io_end);
>  		iocb->private = NULL;
> -		return;
> +		goto out;
>  	}
>  
>  	io_end->offset = offset;
> @@ -3812,6 +3813,9 @@ static void ext4_end_io_dio(struct kiocb
>  	list_add_tail(&io_end->list, &ei->i_completed_io_list);
>  	spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
>  	iocb->private = NULL;
> +out:
> +	if (is_async)
> +		aio_complete(iocb, ret, 0);
>  }
>  
>  static void ext4_end_io_buffer_write(struct buffer_head *bh, int uptodate)
> Index: linux-2.6/fs/ocfs2/aops.c
> ===================================================================
> --- linux-2.6.orig/fs/ocfs2/aops.c	2010-06-22 09:48:37.259012749 +0200
> +++ linux-2.6/fs/ocfs2/aops.c	2010-06-22 12:19:03.931005757 +0200
> @@ -609,7 +609,9 @@ bail:
>  static void ocfs2_dio_end_io(struct kiocb *iocb,
>  			     loff_t offset,
>  			     ssize_t bytes,
> -			     void *private)
> +			     void *private,
> +			     int ret,
> +			     bool is_async)
>  {
>  	struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode;
>  	int level;
> @@ -623,6 +625,9 @@ static void ocfs2_dio_end_io(struct kioc
>  	if (!level)
>  		up_read(&inode->i_alloc_sem);
>  	ocfs2_rw_unlock(inode, level);
> +
> +	if (is_async)
> +		aio_complete(iocb, ret, 0);
>  }
>  
>  /*
> Index: linux-2.6/fs/xfs/linux-2.6/xfs_aops.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_aops.c	2010-06-22 09:48:37.268012190 +0200
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_aops.c	2010-06-22 14:05:02.522005722 +0200
> @@ -1599,7 +1599,9 @@ xfs_end_io_direct(
>  	struct kiocb	*iocb,
>  	loff_t		offset,
>  	ssize_t		size,
> -	void		*private)
> +	void		*private,
> +	int		ret,
> +	bool		is_async)
>  {
>  	xfs_ioend_t	*ioend = iocb->private;
>  
> @@ -1645,6 +1647,9 @@ xfs_end_io_direct(
>  	 * against double-freeing.
>  	 */
>  	iocb->private = NULL;
> +
> +	if (is_async)
> +		aio_complete(iocb, ret, 0);
>  }
>  
>  STATIC ssize_t
> Index: linux-2.6/fs/xfs/linux-2.6/xfs_aops.h
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_aops.h	2010-06-22 09:48:37.278274238 +0200
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_aops.h	2010-06-22 09:49:12.388034051 +0200
> @@ -37,6 +37,8 @@ typedef struct xfs_ioend {
>  	size_t			io_size;	/* size of the extent */
>  	xfs_off_t		io_offset;	/* offset in the file */
>  	struct work_struct	io_work;	/* xfsdatad work queue */
> +	struct kiocb		*io_iocb;
> +	int			io_result;
>  } xfs_ioend_t;
>  
>  extern const struct address_space_operations xfs_address_space_operations;
> Index: linux-2.6/include/linux/fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/fs.h	2010-06-22 09:49:07.188253984 +0200
> +++ linux-2.6/include/linux/fs.h	2010-06-22 10:34:10.128005975 +0200
> @@ -415,7 +415,8 @@ struct buffer_head;
>  typedef int (get_block_t)(struct inode *inode, sector_t iblock,
>  			struct buffer_head *bh_result, int create);
>  typedef void (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
> -			ssize_t bytes, void *private);
> +			ssize_t bytes, void *private, int ret,
> +			bool is_async);
>  
>  /*
>   * Attribute flags.  These should be or-ed together to figure out what
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] direct-io: move aio_complete into ->end_io
@ 2010-06-24 21:59     ` Jan Kara
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Kara @ 2010-06-24 21:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-ext4, xfs

On Tue 22-06-10 08:21:45, Christoph Hellwig wrote:
> Filesystems with unwritten extent support must not complete an AIO request
> until the transaction to convert the extent has been commited.  That means
> the aio_complete calls needs to be moved into the ->end_io callback so
> that the filesystem can control when to call it exactly.
> 
> This makes a bit of a mess out of dio_complete and the ->end_io callback
> prototype even more complicated.  In addition ->end_io is now called with
> i_alloc_sem held for DIO_LOCKING filesystems.  The only filesystem that
> has both and ->end_io callback and sets DIO_LOCKING is ext4, which doesn't
> appear to do anything that could deadlock with i_alloc_sem in ->end_io.
  Umm, I don't get this. Looking at the ->end_io callback it has been
always called with i_alloc_sem held. It's only aio_complete() which will
be called with i_alloc_sem held after your changes. Or am I missing
something?
  Moreover the async testing you do does not seem to be completely right.
dio->is_async is a flag that controls whether dio code waits for IO to be
completed or not. In particular it is not set for AIO that spans beyond
current i_size so it does not seem to be exactly what you need (at least
for ext4 it isn't). I think that is_sync_kiocb() is a test that should be
used to recognize AIO - and that has an advantage that you don't have to
pass the is_async flag around.

								Honza
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Index: linux-2.6/fs/direct-io.c
> ===================================================================
> --- linux-2.6.orig/fs/direct-io.c	2010-06-22 09:48:37.239004298 +0200
> +++ linux-2.6/fs/direct-io.c	2010-06-22 11:54:42.281003878 +0200
> @@ -218,7 +218,7 @@ static struct page *dio_get_page(struct
>   * filesystems can use it to hold additional state between get_block calls and
>   * dio_complete.
>   */
> -static int dio_complete(struct dio *dio, loff_t offset, int ret)
> +static int dio_complete(struct dio *dio, loff_t offset, int ret, bool is_async)
>  {
>  	ssize_t transferred = 0;
>  
> @@ -239,14 +239,6 @@ static int dio_complete(struct dio *dio,
>  			transferred = dio->i_size - offset;
>  	}
>  
> -	if (dio->end_io && dio->result)
> -		dio->end_io(dio->iocb, offset, transferred,
> -			    dio->map_bh.b_private);
> -
> -	if (dio->flags & DIO_LOCKING)
> -		/* lockdep: non-owner release */
> -		up_read_non_owner(&dio->inode->i_alloc_sem);
> -
>  	if (ret == 0)
>  		ret = dio->page_errors;
>  	if (ret == 0)
> @@ -254,6 +246,17 @@ static int dio_complete(struct dio *dio,
>  	if (ret == 0)
>  		ret = transferred;
>  
> +	if (dio->end_io && dio->result) {
> +		dio->end_io(dio->iocb, offset, transferred,
> +			    dio->map_bh.b_private, ret, is_async);
> +	} else if (is_async) {
> +		aio_complete(dio->iocb, ret, 0);
> +	}
> +
> +	if (dio->flags & DIO_LOCKING)
> +		/* lockdep: non-owner release */
> +		up_read_non_owner(&dio->inode->i_alloc_sem);
> +
>  	return ret;
>  }
>  
> @@ -277,8 +280,7 @@ static void dio_bio_end_aio(struct bio *
>  	spin_unlock_irqrestore(&dio->bio_lock, flags);
>  
>  	if (remaining == 0) {
> -		int ret = dio_complete(dio, dio->iocb->ki_pos, 0);
> -		aio_complete(dio->iocb, ret, 0);
> +		dio_complete(dio, dio->iocb->ki_pos, 0, true);
>  		kfree(dio);
>  	}
>  }
> @@ -1126,7 +1128,7 @@ direct_io_worker(int rw, struct kiocb *i
>  	spin_unlock_irqrestore(&dio->bio_lock, flags);
>  
>  	if (ret2 == 0) {
> -		ret = dio_complete(dio, offset, ret);
> +		ret = dio_complete(dio, offset, ret, false);
>  		kfree(dio);
>  	} else
>  		BUG_ON(ret != -EIOCBQUEUED);
> Index: linux-2.6/fs/ext4/inode.c
> ===================================================================
> --- linux-2.6.orig/fs/ext4/inode.c	2010-06-22 09:48:37.249004508 +0200
> +++ linux-2.6/fs/ext4/inode.c	2010-06-22 12:18:45.883255381 +0200
> @@ -3775,7 +3775,8 @@ static ext4_io_end_t *ext4_init_io_end (
>  }
>  
>  static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
> -			    ssize_t size, void *private)
> +			    ssize_t size, void *private, int ret,
> +			    bool is_async)
>  {
>          ext4_io_end_t *io_end = iocb->private;
>  	struct workqueue_struct *wq;
> @@ -3784,7 +3785,7 @@ static void ext4_end_io_dio(struct kiocb
>  
>  	/* if not async direct IO or dio with 0 bytes write, just return */
>  	if (!io_end || !size)
> -		return;
> +		goto out;
>  
>  	ext_debug("ext4_end_io_dio(): io_end 0x%p"
>  		  "for inode %lu, iocb 0x%p, offset %llu, size %llu\n",
> @@ -3795,7 +3796,7 @@ static void ext4_end_io_dio(struct kiocb
>  	if (io_end->flag != EXT4_IO_UNWRITTEN){
>  		ext4_free_io_end(io_end);
>  		iocb->private = NULL;
> -		return;
> +		goto out;
>  	}
>  
>  	io_end->offset = offset;
> @@ -3812,6 +3813,9 @@ static void ext4_end_io_dio(struct kiocb
>  	list_add_tail(&io_end->list, &ei->i_completed_io_list);
>  	spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
>  	iocb->private = NULL;
> +out:
> +	if (is_async)
> +		aio_complete(iocb, ret, 0);
>  }
>  
>  static void ext4_end_io_buffer_write(struct buffer_head *bh, int uptodate)
> Index: linux-2.6/fs/ocfs2/aops.c
> ===================================================================
> --- linux-2.6.orig/fs/ocfs2/aops.c	2010-06-22 09:48:37.259012749 +0200
> +++ linux-2.6/fs/ocfs2/aops.c	2010-06-22 12:19:03.931005757 +0200
> @@ -609,7 +609,9 @@ bail:
>  static void ocfs2_dio_end_io(struct kiocb *iocb,
>  			     loff_t offset,
>  			     ssize_t bytes,
> -			     void *private)
> +			     void *private,
> +			     int ret,
> +			     bool is_async)
>  {
>  	struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode;
>  	int level;
> @@ -623,6 +625,9 @@ static void ocfs2_dio_end_io(struct kioc
>  	if (!level)
>  		up_read(&inode->i_alloc_sem);
>  	ocfs2_rw_unlock(inode, level);
> +
> +	if (is_async)
> +		aio_complete(iocb, ret, 0);
>  }
>  
>  /*
> Index: linux-2.6/fs/xfs/linux-2.6/xfs_aops.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_aops.c	2010-06-22 09:48:37.268012190 +0200
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_aops.c	2010-06-22 14:05:02.522005722 +0200
> @@ -1599,7 +1599,9 @@ xfs_end_io_direct(
>  	struct kiocb	*iocb,
>  	loff_t		offset,
>  	ssize_t		size,
> -	void		*private)
> +	void		*private,
> +	int		ret,
> +	bool		is_async)
>  {
>  	xfs_ioend_t	*ioend = iocb->private;
>  
> @@ -1645,6 +1647,9 @@ xfs_end_io_direct(
>  	 * against double-freeing.
>  	 */
>  	iocb->private = NULL;
> +
> +	if (is_async)
> +		aio_complete(iocb, ret, 0);
>  }
>  
>  STATIC ssize_t
> Index: linux-2.6/fs/xfs/linux-2.6/xfs_aops.h
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_aops.h	2010-06-22 09:48:37.278274238 +0200
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_aops.h	2010-06-22 09:49:12.388034051 +0200
> @@ -37,6 +37,8 @@ typedef struct xfs_ioend {
>  	size_t			io_size;	/* size of the extent */
>  	xfs_off_t		io_offset;	/* offset in the file */
>  	struct work_struct	io_work;	/* xfsdatad work queue */
> +	struct kiocb		*io_iocb;
> +	int			io_result;
>  } xfs_ioend_t;
>  
>  extern const struct address_space_operations xfs_address_space_operations;
> Index: linux-2.6/include/linux/fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/fs.h	2010-06-22 09:49:07.188253984 +0200
> +++ linux-2.6/include/linux/fs.h	2010-06-22 10:34:10.128005975 +0200
> @@ -415,7 +415,8 @@ struct buffer_head;
>  typedef int (get_block_t)(struct inode *inode, sector_t iblock,
>  			struct buffer_head *bh_result, int create);
>  typedef void (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
> -			ssize_t bytes, void *private);
> +			ssize_t bytes, void *private, int ret,
> +			bool is_async);
>  
>  /*
>   * Attribute flags.  These should be or-ed together to figure out what
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] direct-io: move aio_complete into ->end_io
  2010-06-24 21:59     ` Jan Kara
@ 2010-06-25  6:36       ` Christoph Hellwig
  -1 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2010-06-25  6:36 UTC (permalink / raw)
  To: Jan Kara; +Cc: Christoph Hellwig, linux-fsdevel, xfs, linux-ext4

On Thu, Jun 24, 2010 at 11:59:22PM +0200, Jan Kara wrote:
>   Umm, I don't get this. Looking at the ->end_io callback it has been
> always called with i_alloc_sem held. It's only aio_complete() which will
> be called with i_alloc_sem held after your changes. Or am I missing
> something?

No, that part of the commit message is flat out wrong.  Not sure what
I was thinking when I wrote it.

>   Moreover the async testing you do does not seem to be completely right.
> dio->is_async is a flag that controls whether dio code waits for IO to be
> completed or not. In particular it is not set for AIO that spans beyond
> current i_size so it does not seem to be exactly what you need (at least
> for ext4 it isn't). I think that is_sync_kiocb() is a test that should be
> used to recognize AIO - and that has an advantage that you don't have to
> pass the is_async flag around.

No.  is_sync_kiocb() means the ioctb was not intended as sync I/O from
the start.  But we can only call aio_complete when we returned
-EIOCBQUEUED from ->aio_read/write.  Take a look at the comment near the
end of direct_io_worker().

AIO beyond i_size is not supported using blockdev_direct_IO yet.  I
think I can add it fairly easily for XFS, but that will require
passing a new DIO_* flag to __blockdev_direct_IO which will make
is_async true for writes beyond i_size.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] direct-io: move aio_complete into ->end_io
@ 2010-06-25  6:36       ` Christoph Hellwig
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2010-06-25  6:36 UTC (permalink / raw)
  To: Jan Kara; +Cc: Christoph Hellwig, linux-fsdevel, linux-ext4, xfs

On Thu, Jun 24, 2010 at 11:59:22PM +0200, Jan Kara wrote:
>   Umm, I don't get this. Looking at the ->end_io callback it has been
> always called with i_alloc_sem held. It's only aio_complete() which will
> be called with i_alloc_sem held after your changes. Or am I missing
> something?

No, that part of the commit message is flat out wrong.  Not sure what
I was thinking when I wrote it.

>   Moreover the async testing you do does not seem to be completely right.
> dio->is_async is a flag that controls whether dio code waits for IO to be
> completed or not. In particular it is not set for AIO that spans beyond
> current i_size so it does not seem to be exactly what you need (at least
> for ext4 it isn't). I think that is_sync_kiocb() is a test that should be
> used to recognize AIO - and that has an advantage that you don't have to
> pass the is_async flag around.

No.  is_sync_kiocb() means the ioctb was not intended as sync I/O from
the start.  But we can only call aio_complete when we returned
-EIOCBQUEUED from ->aio_read/write.  Take a look at the comment near the
end of direct_io_worker().

AIO beyond i_size is not supported using blockdev_direct_IO yet.  I
think I can add it fairly easily for XFS, but that will require
passing a new DIO_* flag to __blockdev_direct_IO which will make
is_async true for writes beyond i_size.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] direct-io: move aio_complete into ->end_io
  2010-06-25  6:36       ` Christoph Hellwig
@ 2010-06-25 10:35         ` Jan Kara
  -1 siblings, 0 replies; 18+ messages in thread
From: Jan Kara @ 2010-06-25 10:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jan Kara, linux-fsdevel, xfs, linux-ext4

On Fri 25-06-10 02:36:10, Christoph Hellwig wrote:
> On Thu, Jun 24, 2010 at 11:59:22PM +0200, Jan Kara wrote:
> >   Moreover the async testing you do does not seem to be completely right.
> > dio->is_async is a flag that controls whether dio code waits for IO to be
> > completed or not. In particular it is not set for AIO that spans beyond
> > current i_size so it does not seem to be exactly what you need (at least
> > for ext4 it isn't). I think that is_sync_kiocb() is a test that should be
> > used to recognize AIO - and that has an advantage that you don't have to
> > pass the is_async flag around.
> 
> No.  is_sync_kiocb() means the ioctb was not intended as sync I/O from
> the start.  But we can only call aio_complete when we returned
> -EIOCBQUEUED from ->aio_read/write.  Take a look at the comment near the
> end of direct_io_worker().
  Ah, I see. Thanks for explanation. It's ugly but I also don't see a
nicer way how to handle this.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] direct-io: move aio_complete into ->end_io
@ 2010-06-25 10:35         ` Jan Kara
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Kara @ 2010-06-25 10:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-ext4, Jan Kara, xfs

On Fri 25-06-10 02:36:10, Christoph Hellwig wrote:
> On Thu, Jun 24, 2010 at 11:59:22PM +0200, Jan Kara wrote:
> >   Moreover the async testing you do does not seem to be completely right.
> > dio->is_async is a flag that controls whether dio code waits for IO to be
> > completed or not. In particular it is not set for AIO that spans beyond
> > current i_size so it does not seem to be exactly what you need (at least
> > for ext4 it isn't). I think that is_sync_kiocb() is a test that should be
> > used to recognize AIO - and that has an advantage that you don't have to
> > pass the is_async flag around.
> 
> No.  is_sync_kiocb() means the ioctb was not intended as sync I/O from
> the start.  But we can only call aio_complete when we returned
> -EIOCBQUEUED from ->aio_read/write.  Take a look at the comment near the
> end of direct_io_worker().
  Ah, I see. Thanks for explanation. It's ugly but I also don't see a
nicer way how to handle this.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] Fix aio completion vs unwritten extents
  2010-06-22 12:21 ` Christoph Hellwig
@ 2010-07-16  6:04   ` Christoph Hellwig
  -1 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2010-07-16  6:04 UTC (permalink / raw)
  To: linux-fsdevel, xfs, linux-ext4

Given that this still hasn't been picked up in any other tree would
people mind if we push patches 1 and 2 through the XFS tree?

I some more changes that sit ontop of this, and it would make my
life a lot easier.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] Fix aio completion vs unwritten extents
@ 2010-07-16  6:04   ` Christoph Hellwig
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2010-07-16  6:04 UTC (permalink / raw)
  To: linux-fsdevel, xfs, linux-ext4

Given that this still hasn't been picked up in any other tree would
people mind if we push patches 1 and 2 through the XFS tree?

I some more changes that sit ontop of this, and it would make my
life a lot easier.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] Fix aio completion vs unwritten extents
  2010-07-16  6:04   ` Christoph Hellwig
@ 2010-07-16  6:30     ` Theodore Tso
  -1 siblings, 0 replies; 18+ messages in thread
From: Theodore Tso @ 2010-07-16  6:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, xfs, linux-ext4


On Jul 16, 2010, at 2:04 AM, Christoph Hellwig wrote:

> Given that this still hasn't been picked up in any other tree would
> people mind if we push patches 1 and 2 through the XFS tree?
> 
> I some more changes that sit ontop of this, and it would make my
> life a lot easier.


Thanks for bringing this up.  I was going to ask if you had any changes
in patch1 of this series since I was about to put them into the ext4 tree
and I didn't want to have any merge conflicts (or have to force a tree
rewind/rebase) if it turned out if there had been some changes and
some other tree landed in Linus's tree first.

In other words, since we both have patches that depend on your
first patch, one easy way of handling things is that we both put them
into our respective fs trees, and as long as the patch doesn't change
git should do the right thing when Steve or Linus merges them into their
linux-next or linus trees, respectively.

Do you have any objections with this?

									- Ted


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] Fix aio completion vs unwritten extents
@ 2010-07-16  6:30     ` Theodore Tso
  0 siblings, 0 replies; 18+ messages in thread
From: Theodore Tso @ 2010-07-16  6:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-ext4, xfs


On Jul 16, 2010, at 2:04 AM, Christoph Hellwig wrote:

> Given that this still hasn't been picked up in any other tree would
> people mind if we push patches 1 and 2 through the XFS tree?
> 
> I some more changes that sit ontop of this, and it would make my
> life a lot easier.


Thanks for bringing this up.  I was going to ask if you had any changes
in patch1 of this series since I was about to put them into the ext4 tree
and I didn't want to have any merge conflicts (or have to force a tree
rewind/rebase) if it turned out if there had been some changes and
some other tree landed in Linus's tree first.

In other words, since we both have patches that depend on your
first patch, one easy way of handling things is that we both put them
into our respective fs trees, and as long as the patch doesn't change
git should do the right thing when Steve or Linus merges them into their
linux-next or linus trees, respectively.

Do you have any objections with this?

									- Ted

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] Fix aio completion vs unwritten extents
  2010-07-16  6:30     ` Theodore Tso
@ 2010-07-18  5:00       ` Christoph Hellwig
  -1 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2010-07-18  5:00 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Christoph Hellwig, linux-fsdevel, xfs, linux-ext4

On Fri, Jul 16, 2010 at 02:30:07AM -0400, Theodore Tso wrote:
> Thanks for bringing this up.  I was going to ask if you had any changes
> in patch1 of this series since I was about to put them into the ext4 tree
> and I didn't want to have any merge conflicts (or have to force a tree
> rewind/rebase) if it turned out if there had been some changes and
> some other tree landed in Linus's tree first.
> 
> In other words, since we both have patches that depend on your
> first patch, one easy way of handling things is that we both put them
> into our respective fs trees, and as long as the patch doesn't change
> git should do the right thing when Steve or Linus merges them into their
> linux-next or linus trees, respectively.
> 
> Do you have any objections with this?

Sounds fine to me.  The patch is final content-wise, but Jan pointed out
that the patch description wasn't quite correct.  Here's a version with
the updated patch description:

---

From: Christoph Hellwig <hch@lst.de>
Subject: direct-io: move aio_complete into ->end_io

Filesystems with unwritten extent support must not complete an AIO request
until the transaction to convert the extent has been commited.  That means
the aio_complete calls needs to be moved into the ->end_io callback so
that the filesystem can control when to call it exactly.

This makes a bit of a mess out of dio_complete and the ->end_io callback
prototype even more complicated. 

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz> 

Index: linux-2.6/fs/direct-io.c
===================================================================
--- linux-2.6.orig/fs/direct-io.c	2010-06-22 09:48:37.239004298 +0200
+++ linux-2.6/fs/direct-io.c	2010-06-22 11:54:42.281003878 +0200
@@ -218,7 +218,7 @@ static struct page *dio_get_page(struct
  * filesystems can use it to hold additional state between get_block calls and
  * dio_complete.
  */
-static int dio_complete(struct dio *dio, loff_t offset, int ret)
+static int dio_complete(struct dio *dio, loff_t offset, int ret, bool is_async)
 {
 	ssize_t transferred = 0;
 
@@ -239,14 +239,6 @@ static int dio_complete(struct dio *dio,
 			transferred = dio->i_size - offset;
 	}
 
-	if (dio->end_io && dio->result)
-		dio->end_io(dio->iocb, offset, transferred,
-			    dio->map_bh.b_private);
-
-	if (dio->flags & DIO_LOCKING)
-		/* lockdep: non-owner release */
-		up_read_non_owner(&dio->inode->i_alloc_sem);
-
 	if (ret == 0)
 		ret = dio->page_errors;
 	if (ret == 0)
@@ -254,6 +246,17 @@ static int dio_complete(struct dio *dio,
 	if (ret == 0)
 		ret = transferred;
 
+	if (dio->end_io && dio->result) {
+		dio->end_io(dio->iocb, offset, transferred,
+			    dio->map_bh.b_private, ret, is_async);
+	} else if (is_async) {
+		aio_complete(dio->iocb, ret, 0);
+	}
+
+	if (dio->flags & DIO_LOCKING)
+		/* lockdep: non-owner release */
+		up_read_non_owner(&dio->inode->i_alloc_sem);
+
 	return ret;
 }
 
@@ -277,8 +280,7 @@ static void dio_bio_end_aio(struct bio *
 	spin_unlock_irqrestore(&dio->bio_lock, flags);
 
 	if (remaining == 0) {
-		int ret = dio_complete(dio, dio->iocb->ki_pos, 0);
-		aio_complete(dio->iocb, ret, 0);
+		dio_complete(dio, dio->iocb->ki_pos, 0, true);
 		kfree(dio);
 	}
 }
@@ -1126,7 +1128,7 @@ direct_io_worker(int rw, struct kiocb *i
 	spin_unlock_irqrestore(&dio->bio_lock, flags);
 
 	if (ret2 == 0) {
-		ret = dio_complete(dio, offset, ret);
+		ret = dio_complete(dio, offset, ret, false);
 		kfree(dio);
 	} else
 		BUG_ON(ret != -EIOCBQUEUED);
Index: linux-2.6/fs/ext4/inode.c
===================================================================
--- linux-2.6.orig/fs/ext4/inode.c	2010-06-22 09:48:37.249004508 +0200
+++ linux-2.6/fs/ext4/inode.c	2010-06-22 12:18:45.883255381 +0200
@@ -3775,7 +3775,8 @@ static ext4_io_end_t *ext4_init_io_end (
 }
 
 static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
-			    ssize_t size, void *private)
+			    ssize_t size, void *private, int ret,
+			    bool is_async)
 {
         ext4_io_end_t *io_end = iocb->private;
 	struct workqueue_struct *wq;
@@ -3784,7 +3785,7 @@ static void ext4_end_io_dio(struct kiocb
 
 	/* if not async direct IO or dio with 0 bytes write, just return */
 	if (!io_end || !size)
-		return;
+		goto out;
 
 	ext_debug("ext4_end_io_dio(): io_end 0x%p"
 		  "for inode %lu, iocb 0x%p, offset %llu, size %llu\n",
@@ -3795,7 +3796,7 @@ static void ext4_end_io_dio(struct kiocb
 	if (io_end->flag != EXT4_IO_UNWRITTEN){
 		ext4_free_io_end(io_end);
 		iocb->private = NULL;
-		return;
+		goto out;
 	}
 
 	io_end->offset = offset;
@@ -3812,6 +3813,9 @@ static void ext4_end_io_dio(struct kiocb
 	list_add_tail(&io_end->list, &ei->i_completed_io_list);
 	spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
 	iocb->private = NULL;
+out:
+	if (is_async)
+		aio_complete(iocb, ret, 0);
 }
 
 static void ext4_end_io_buffer_write(struct buffer_head *bh, int uptodate)
Index: linux-2.6/fs/ocfs2/aops.c
===================================================================
--- linux-2.6.orig/fs/ocfs2/aops.c	2010-06-22 09:48:37.259012749 +0200
+++ linux-2.6/fs/ocfs2/aops.c	2010-06-22 12:19:03.931005757 +0200
@@ -609,7 +609,9 @@ bail:
 static void ocfs2_dio_end_io(struct kiocb *iocb,
 			     loff_t offset,
 			     ssize_t bytes,
-			     void *private)
+			     void *private,
+			     int ret,
+			     bool is_async)
 {
 	struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode;
 	int level;
@@ -623,6 +625,9 @@ static void ocfs2_dio_end_io(struct kioc
 	if (!level)
 		up_read(&inode->i_alloc_sem);
 	ocfs2_rw_unlock(inode, level);
+
+	if (is_async)
+		aio_complete(iocb, ret, 0);
 }
 
 /*
Index: linux-2.6/fs/xfs/linux-2.6/xfs_aops.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_aops.c	2010-06-22 09:48:37.268012190 +0200
+++ linux-2.6/fs/xfs/linux-2.6/xfs_aops.c	2010-06-22 14:05:02.522005722 +0200
@@ -1599,7 +1599,9 @@ xfs_end_io_direct(
 	struct kiocb	*iocb,
 	loff_t		offset,
 	ssize_t		size,
-	void		*private)
+	void		*private,
+	int		ret,
+	bool		is_async)
 {
 	xfs_ioend_t	*ioend = iocb->private;
 
@@ -1645,6 +1647,9 @@ xfs_end_io_direct(
 	 * against double-freeing.
 	 */
 	iocb->private = NULL;
+
+	if (is_async)
+		aio_complete(iocb, ret, 0);
 }
 
 STATIC ssize_t
Index: linux-2.6/fs/xfs/linux-2.6/xfs_aops.h
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_aops.h	2010-06-22 09:48:37.278274238 +0200
+++ linux-2.6/fs/xfs/linux-2.6/xfs_aops.h	2010-06-22 09:49:12.388034051 +0200
@@ -37,6 +37,8 @@ typedef struct xfs_ioend {
 	size_t			io_size;	/* size of the extent */
 	xfs_off_t		io_offset;	/* offset in the file */
 	struct work_struct	io_work;	/* xfsdatad work queue */
+	struct kiocb		*io_iocb;
+	int			io_result;
 } xfs_ioend_t;
 
 extern const struct address_space_operations xfs_address_space_operations;
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2010-06-22 09:49:07.188253984 +0200
+++ linux-2.6/include/linux/fs.h	2010-06-22 10:34:10.128005975 +0200
@@ -415,7 +415,8 @@ struct buffer_head;
 typedef int (get_block_t)(struct inode *inode, sector_t iblock,
 			struct buffer_head *bh_result, int create);
 typedef void (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
-			ssize_t bytes, void *private);
+			ssize_t bytes, void *private, int ret,
+			bool is_async);
 
 /*
  * Attribute flags.  These should be or-ed together to figure out what

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] Fix aio completion vs unwritten extents
@ 2010-07-18  5:00       ` Christoph Hellwig
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2010-07-18  5:00 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Christoph Hellwig, linux-fsdevel, linux-ext4, xfs

On Fri, Jul 16, 2010 at 02:30:07AM -0400, Theodore Tso wrote:
> Thanks for bringing this up.  I was going to ask if you had any changes
> in patch1 of this series since I was about to put them into the ext4 tree
> and I didn't want to have any merge conflicts (or have to force a tree
> rewind/rebase) if it turned out if there had been some changes and
> some other tree landed in Linus's tree first.
> 
> In other words, since we both have patches that depend on your
> first patch, one easy way of handling things is that we both put them
> into our respective fs trees, and as long as the patch doesn't change
> git should do the right thing when Steve or Linus merges them into their
> linux-next or linus trees, respectively.
> 
> Do you have any objections with this?

Sounds fine to me.  The patch is final content-wise, but Jan pointed out
that the patch description wasn't quite correct.  Here's a version with
the updated patch description:

---

From: Christoph Hellwig <hch@lst.de>
Subject: direct-io: move aio_complete into ->end_io

Filesystems with unwritten extent support must not complete an AIO request
until the transaction to convert the extent has been commited.  That means
the aio_complete calls needs to be moved into the ->end_io callback so
that the filesystem can control when to call it exactly.

This makes a bit of a mess out of dio_complete and the ->end_io callback
prototype even more complicated. 

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz> 

Index: linux-2.6/fs/direct-io.c
===================================================================
--- linux-2.6.orig/fs/direct-io.c	2010-06-22 09:48:37.239004298 +0200
+++ linux-2.6/fs/direct-io.c	2010-06-22 11:54:42.281003878 +0200
@@ -218,7 +218,7 @@ static struct page *dio_get_page(struct
  * filesystems can use it to hold additional state between get_block calls and
  * dio_complete.
  */
-static int dio_complete(struct dio *dio, loff_t offset, int ret)
+static int dio_complete(struct dio *dio, loff_t offset, int ret, bool is_async)
 {
 	ssize_t transferred = 0;
 
@@ -239,14 +239,6 @@ static int dio_complete(struct dio *dio,
 			transferred = dio->i_size - offset;
 	}
 
-	if (dio->end_io && dio->result)
-		dio->end_io(dio->iocb, offset, transferred,
-			    dio->map_bh.b_private);
-
-	if (dio->flags & DIO_LOCKING)
-		/* lockdep: non-owner release */
-		up_read_non_owner(&dio->inode->i_alloc_sem);
-
 	if (ret == 0)
 		ret = dio->page_errors;
 	if (ret == 0)
@@ -254,6 +246,17 @@ static int dio_complete(struct dio *dio,
 	if (ret == 0)
 		ret = transferred;
 
+	if (dio->end_io && dio->result) {
+		dio->end_io(dio->iocb, offset, transferred,
+			    dio->map_bh.b_private, ret, is_async);
+	} else if (is_async) {
+		aio_complete(dio->iocb, ret, 0);
+	}
+
+	if (dio->flags & DIO_LOCKING)
+		/* lockdep: non-owner release */
+		up_read_non_owner(&dio->inode->i_alloc_sem);
+
 	return ret;
 }
 
@@ -277,8 +280,7 @@ static void dio_bio_end_aio(struct bio *
 	spin_unlock_irqrestore(&dio->bio_lock, flags);
 
 	if (remaining == 0) {
-		int ret = dio_complete(dio, dio->iocb->ki_pos, 0);
-		aio_complete(dio->iocb, ret, 0);
+		dio_complete(dio, dio->iocb->ki_pos, 0, true);
 		kfree(dio);
 	}
 }
@@ -1126,7 +1128,7 @@ direct_io_worker(int rw, struct kiocb *i
 	spin_unlock_irqrestore(&dio->bio_lock, flags);
 
 	if (ret2 == 0) {
-		ret = dio_complete(dio, offset, ret);
+		ret = dio_complete(dio, offset, ret, false);
 		kfree(dio);
 	} else
 		BUG_ON(ret != -EIOCBQUEUED);
Index: linux-2.6/fs/ext4/inode.c
===================================================================
--- linux-2.6.orig/fs/ext4/inode.c	2010-06-22 09:48:37.249004508 +0200
+++ linux-2.6/fs/ext4/inode.c	2010-06-22 12:18:45.883255381 +0200
@@ -3775,7 +3775,8 @@ static ext4_io_end_t *ext4_init_io_end (
 }
 
 static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
-			    ssize_t size, void *private)
+			    ssize_t size, void *private, int ret,
+			    bool is_async)
 {
         ext4_io_end_t *io_end = iocb->private;
 	struct workqueue_struct *wq;
@@ -3784,7 +3785,7 @@ static void ext4_end_io_dio(struct kiocb
 
 	/* if not async direct IO or dio with 0 bytes write, just return */
 	if (!io_end || !size)
-		return;
+		goto out;
 
 	ext_debug("ext4_end_io_dio(): io_end 0x%p"
 		  "for inode %lu, iocb 0x%p, offset %llu, size %llu\n",
@@ -3795,7 +3796,7 @@ static void ext4_end_io_dio(struct kiocb
 	if (io_end->flag != EXT4_IO_UNWRITTEN){
 		ext4_free_io_end(io_end);
 		iocb->private = NULL;
-		return;
+		goto out;
 	}
 
 	io_end->offset = offset;
@@ -3812,6 +3813,9 @@ static void ext4_end_io_dio(struct kiocb
 	list_add_tail(&io_end->list, &ei->i_completed_io_list);
 	spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
 	iocb->private = NULL;
+out:
+	if (is_async)
+		aio_complete(iocb, ret, 0);
 }
 
 static void ext4_end_io_buffer_write(struct buffer_head *bh, int uptodate)
Index: linux-2.6/fs/ocfs2/aops.c
===================================================================
--- linux-2.6.orig/fs/ocfs2/aops.c	2010-06-22 09:48:37.259012749 +0200
+++ linux-2.6/fs/ocfs2/aops.c	2010-06-22 12:19:03.931005757 +0200
@@ -609,7 +609,9 @@ bail:
 static void ocfs2_dio_end_io(struct kiocb *iocb,
 			     loff_t offset,
 			     ssize_t bytes,
-			     void *private)
+			     void *private,
+			     int ret,
+			     bool is_async)
 {
 	struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode;
 	int level;
@@ -623,6 +625,9 @@ static void ocfs2_dio_end_io(struct kioc
 	if (!level)
 		up_read(&inode->i_alloc_sem);
 	ocfs2_rw_unlock(inode, level);
+
+	if (is_async)
+		aio_complete(iocb, ret, 0);
 }
 
 /*
Index: linux-2.6/fs/xfs/linux-2.6/xfs_aops.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_aops.c	2010-06-22 09:48:37.268012190 +0200
+++ linux-2.6/fs/xfs/linux-2.6/xfs_aops.c	2010-06-22 14:05:02.522005722 +0200
@@ -1599,7 +1599,9 @@ xfs_end_io_direct(
 	struct kiocb	*iocb,
 	loff_t		offset,
 	ssize_t		size,
-	void		*private)
+	void		*private,
+	int		ret,
+	bool		is_async)
 {
 	xfs_ioend_t	*ioend = iocb->private;
 
@@ -1645,6 +1647,9 @@ xfs_end_io_direct(
 	 * against double-freeing.
 	 */
 	iocb->private = NULL;
+
+	if (is_async)
+		aio_complete(iocb, ret, 0);
 }
 
 STATIC ssize_t
Index: linux-2.6/fs/xfs/linux-2.6/xfs_aops.h
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_aops.h	2010-06-22 09:48:37.278274238 +0200
+++ linux-2.6/fs/xfs/linux-2.6/xfs_aops.h	2010-06-22 09:49:12.388034051 +0200
@@ -37,6 +37,8 @@ typedef struct xfs_ioend {
 	size_t			io_size;	/* size of the extent */
 	xfs_off_t		io_offset;	/* offset in the file */
 	struct work_struct	io_work;	/* xfsdatad work queue */
+	struct kiocb		*io_iocb;
+	int			io_result;
 } xfs_ioend_t;
 
 extern const struct address_space_operations xfs_address_space_operations;
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2010-06-22 09:49:07.188253984 +0200
+++ linux-2.6/include/linux/fs.h	2010-06-22 10:34:10.128005975 +0200
@@ -415,7 +415,8 @@ struct buffer_head;
 typedef int (get_block_t)(struct inode *inode, sector_t iblock,
 			struct buffer_head *bh_result, int create);
 typedef void (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
-			ssize_t bytes, void *private);
+			ssize_t bytes, void *private, int ret,
+			bool is_async);
 
 /*
  * Attribute flags.  These should be or-ed together to figure out what

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2010-07-18  5:00 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-22 12:21 [PATCH 0/2] Fix aio completion vs unwritten extents Christoph Hellwig
2010-06-22 12:21 ` Christoph Hellwig
2010-06-22 12:21 ` [PATCH 1/2] direct-io: move aio_complete into ->end_io Christoph Hellwig
2010-06-22 12:21   ` Christoph Hellwig
2010-06-24 21:59   ` Jan Kara
2010-06-24 21:59     ` Jan Kara
2010-06-25  6:36     ` Christoph Hellwig
2010-06-25  6:36       ` Christoph Hellwig
2010-06-25 10:35       ` Jan Kara
2010-06-25 10:35         ` Jan Kara
2010-06-22 12:21 ` [PATCH 2/2] xfs: move aio completion after unwritten extent conversion Christoph Hellwig
2010-06-22 12:21   ` Christoph Hellwig
2010-07-16  6:04 ` [PATCH 0/2] Fix aio completion vs unwritten extents Christoph Hellwig
2010-07-16  6:04   ` Christoph Hellwig
2010-07-16  6:30   ` Theodore Tso
2010-07-16  6:30     ` Theodore Tso
2010-07-18  5:00     ` Christoph Hellwig
2010-07-18  5:00       ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.