[PATCH 1/6] fs: allow short direct-io reads to be completed via buffered IO

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/6] fs: allow short direct-io reads to be completed via buffered IO
@ 2010-05-21 17:03 Josef Bacik
  2010-05-21 17:03 ` [PATCH 2/6] direct-io: add a hook for the fs to provide its own submit_bio function Josef Bacik
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Josef Bacik @ 2010-05-21 17:03 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel, linux-kernel, hch, akpm

This is similar to what already happens in the write case.  If we have a short
read while doing O_DIRECT, instead of just returning, fallthrough and try to
read the rest via buffered IO.  BTRFS needs this because if we encounter a
compressed or inline extent during DIO, we need to fallback on buffered.  If the
extent is compressed we need to read the entire thing into memory and
de-compress it into the users pages.  I have tested this with fsx and everything
works great.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
---

V1->V2: Check to see if our current ppos is >= i_size after a short DIO read,
just in case it was actually a short read and we need to just return.

 mm/filemap.c |   36 +++++++++++++++++++++++++++++++-----
 1 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 140ebda..829ac9c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1263,7 +1263,7 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov,
 {
 	struct file *filp = iocb->ki_filp;
 	ssize_t retval;
-	unsigned long seg;
+	unsigned long seg = 0;
 	size_t count;
 	loff_t *ppos = &iocb->ki_pos;
 
@@ -1290,21 +1290,47 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov,
 				retval = mapping->a_ops->direct_IO(READ, iocb,
 							iov, pos, nr_segs);
 			}
-			if (retval > 0)
+			if (retval > 0) {
 				*ppos = pos + retval;
-			if (retval) {
+				count -= retval;
+			}
+
+			/*
+			 * Btrfs can have a short DIO read if we encounter
+			 * compressed extents, so if there was an error, or if
+			 * we've already read everything we wanted to, or if
+			 * there was a short read because we hit EOF, go ahead
+			 * and return.  Otherwise fallthrough to buffered io for
+			 * the rest of the read.
+			 */
+			if (retval < 0 || !count || *ppos >= size) {
 				file_accessed(filp);
 				goto out;
 			}
 		}
 	}
 
+	count = retval;
 	for (seg = 0; seg < nr_segs; seg++) {
 		read_descriptor_t desc;
+		loff_t offset = 0;
+
+		/*
+		 * If we did a short DIO read we need to skip the section of the
+		 * iov that we've already read data into.
+		 */
+		if (count) {
+			if (count > iov[seg].iov_len) {
+				count -= iov[seg].iov_len;
+				continue;
+			}
+			offset = count;
+			count = 0;
+		}
 
 		desc.written = 0;
-		desc.arg.buf = iov[seg].iov_base;
-		desc.count = iov[seg].iov_len;
+		desc.arg.buf = iov[seg].iov_base + offset;
+		desc.count = iov[seg].iov_len - offset;
 		if (desc.count == 0)
 			continue;
 		desc.error = 0;
-- 
1.6.6.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/6] direct-io: add a hook for the fs to provide its own submit_bio function
  2010-05-21 17:03 [PATCH 1/6] fs: allow short direct-io reads to be completed via buffered IO Josef Bacik
@ 2010-05-21 17:03 ` Josef Bacik
  2010-05-21 17:03 ` [PATCH 3/6] direct-io: do not merge logically non-contiguous requests Josef Bacik
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: Josef Bacik @ 2010-05-21 17:03 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel, linux-kernel, hch, akpm

Because BTRFS can do RAID and such, we need our own submit hook so we can setup
the bio's in the correct fashion, and handle checksum errors properly.  So there
are a few changes here

1) The submit_io hook.  This is straightforward, just call this instead of
submit_bio.

2) Allow the fs to return -ENOTBLK for reads.  Usually this has only worked for
writes, since writes can fallback onto buffered IO.  But BTRFS needs the option
of falling back on buffered IO if it encounters a compressed extent, since we
need to read the entire extent in and decompress it.  So if we get -ENOTBLK back
from get_block we'll return back and fallback on buffered just like the write
case.

I've tested these changes with fsx and everything seems to work.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
---
V1->V2:
-Changed dio_end_io to EXPORT_SYMBOL_GPL
-Removed the own_submit blockdev dio helper
-Removed the boundary change

V2->V3:
-Keep track of the logical offset in the current bio

V3->V4:
-Kerneldoc dio_end_io
-switch if (!dio->submit_io) statement

 fs/direct-io.c     |   42 +++++++++++++++++++++++++++++++++++++-----
 include/linux/fs.h |   11 ++++++++---
 2 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index e82adc2..5949947 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -82,6 +82,8 @@ struct dio {
 	int reap_counter;		/* rate limit reaping */
 	get_block_t *get_block;		/* block mapping function */
 	dio_iodone_t *end_io;		/* IO completion function */
+	dio_submit_t *submit_io;	/* IO submition function */
+	loff_t logical_offset_in_bio;	/* current first logical block in bio */
 	sector_t final_block_in_bio;	/* current final block in bio + 1 */
 	sector_t next_block_for_io;	/* next block to be put under IO,
 					   in dio_blocks units */
@@ -96,6 +98,7 @@ struct dio {
 	unsigned cur_page_offset;	/* Offset into it, in bytes */
 	unsigned cur_page_len;		/* Nr of bytes at cur_page_offset */
 	sector_t cur_page_block;	/* Where it starts */
+	loff_t cur_page_fs_offset;	/* Offset in file */
 
 	/* BIO completion state */
 	spinlock_t bio_lock;		/* protects BIO fields below */
@@ -300,6 +303,26 @@ static void dio_bio_end_io(struct bio *bio, int error)
 	spin_unlock_irqrestore(&dio->bio_lock, flags);
 }
 
+/**
+ * dio_end_io - handle the end io action for the given bio
+ * @bio: The direct io bio thats being completed
+ * @error: Error if there was one
+ *
+ * This is meant to be called by any filesystem that uses their own dio_submit_t
+ * so that the DIO specific endio actions are dealt with after the filesystem
+ * has done it's completion work.
+ */
+void dio_end_io(struct bio *bio, int error)
+{
+	struct dio *dio = bio->bi_private;
+
+	if (dio->is_async)
+		dio_bio_end_aio(bio, error);
+	else
+		dio_bio_end_io(bio, error);
+}
+EXPORT_SYMBOL_GPL(dio_end_io);
+
 static int
 dio_bio_alloc(struct dio *dio, struct block_device *bdev,
 		sector_t first_sector, int nr_vecs)
@@ -316,6 +339,7 @@ dio_bio_alloc(struct dio *dio, struct block_device *bdev,
 		bio->bi_end_io = dio_bio_end_io;
 
 	dio->bio = bio;
+	dio->logical_offset_in_bio = dio->cur_page_fs_offset;
 	return 0;
 }
 
@@ -340,10 +364,15 @@ static void dio_bio_submit(struct dio *dio)
 	if (dio->is_async && dio->rw == READ)
 		bio_set_pages_dirty(bio);
 
-	submit_bio(dio->rw, bio);
+	if (dio->submit_io)
+		dio->submit_io(dio->rw, bio, dio->inode,
+			       dio->logical_offset_in_bio);
+	else
+		submit_bio(dio->rw, bio);
 
 	dio->bio = NULL;
 	dio->boundary = 0;
+	dio->logical_offset_in_bio = 0;
 }
 
 /*
@@ -701,6 +730,7 @@ submit_page_section(struct dio *dio, struct page *page,
 	dio->cur_page_offset = offset;
 	dio->cur_page_len = len;
 	dio->cur_page_block = blocknr;
+	dio->cur_page_fs_offset = dio->block_in_file << dio->blkbits;
 out:
 	return ret;
 }
@@ -935,7 +965,7 @@ static ssize_t
 direct_io_worker(int rw, struct kiocb *iocb, struct inode *inode, 
 	const struct iovec *iov, loff_t offset, unsigned long nr_segs, 
 	unsigned blkbits, get_block_t get_block, dio_iodone_t end_io,
-	struct dio *dio)
+	dio_submit_t submit_io, struct dio *dio)
 {
 	unsigned long user_addr; 
 	unsigned long flags;
@@ -952,6 +982,7 @@ direct_io_worker(int rw, struct kiocb *iocb, struct inode *inode,
 
 	dio->get_block = get_block;
 	dio->end_io = end_io;
+	dio->submit_io = submit_io;
 	dio->final_block_in_bio = -1;
 	dio->next_block_for_io = -1;
 
@@ -1008,7 +1039,7 @@ direct_io_worker(int rw, struct kiocb *iocb, struct inode *inode,
 		}
 	} /* end iovec loop */
 
-	if (ret == -ENOTBLK && (rw & WRITE)) {
+	if (ret == -ENOTBLK) {
 		/*
 		 * The remaining part of the request will be
 		 * be handled by buffered I/O when we return
@@ -1110,7 +1141,7 @@ ssize_t
 __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
 	struct block_device *bdev, const struct iovec *iov, loff_t offset, 
 	unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io,
-	int flags)
+	dio_submit_t submit_io,	int flags)
 {
 	int seg;
 	size_t size;
@@ -1197,7 +1228,8 @@ __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
 		(end > i_size_read(inode)));
 
 	retval = direct_io_worker(rw, iocb, inode, iov, offset,
-				nr_segs, blkbits, get_block, end_io, dio);
+				nr_segs, blkbits, get_block, end_io,
+				submit_io, dio);
 
 	/*
 	 * In case of error extending write may have instantiated a few
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4079ef9..f5dc079 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2250,10 +2250,15 @@ static inline int xip_truncate_page(struct address_space *mapping, loff_t from)
 #endif
 
 #ifdef CONFIG_BLOCK
+struct bio;
+typedef void (dio_submit_t)(int rw, struct bio *bio, struct inode *inode,
+			    loff_t file_offset);
+void dio_end_io(struct bio *bio, int error);
+
 ssize_t __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
 	struct block_device *bdev, const struct iovec *iov, loff_t offset,
 	unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io,
-	int lock_type);
+	dio_submit_t submit_io,	int lock_type);
 
 enum {
 	/* need locking between buffered and direct access */
@@ -2269,7 +2274,7 @@ static inline ssize_t blockdev_direct_IO(int rw, struct kiocb *iocb,
 	dio_iodone_t end_io)
 {
 	return __blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset,
-				    nr_segs, get_block, end_io,
+				    nr_segs, get_block, end_io, NULL,
 				    DIO_LOCKING | DIO_SKIP_HOLES);
 }
 
@@ -2279,7 +2284,7 @@ static inline ssize_t blockdev_direct_IO_no_locking(int rw, struct kiocb *iocb,
 	dio_iodone_t end_io)
 {
 	return __blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset,
-				nr_segs, get_block, end_io, 0);
+				    nr_segs, get_block, end_io, NULL, 0);
 }
 #endif
 
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/6] direct-io: do not merge logically non-contiguous requests
  2010-05-21 17:03 [PATCH 1/6] fs: allow short direct-io reads to be completed via buffered IO Josef Bacik
  2010-05-21 17:03 ` [PATCH 2/6] direct-io: add a hook for the fs to provide its own submit_bio function Josef Bacik
@ 2010-05-21 17:03 ` Josef Bacik
  2010-05-22  1:47   ` Mike Fedyk
  2010-05-21 17:03 ` [PATCH 4/6] fs: kill blockdev_direct_IO_no_locking Josef Bacik
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Josef Bacik @ 2010-05-21 17:03 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel, linux-kernel, hch, akpm

Btrfs cannot handle having logically non-contiguous requests submitted.  For
example if you have

Logical:  [0-4095][HOLE][8192-12287]
Physical: [0-4095]      [4096-8191]

Normally the DIO code would put these into the same BIO's.  The problem is we
need to know exactly what offset is associated with what BIO so we can do our
checksumming and unlocking properly, so putting them in the same BIO doesn't
work.  So add another check where we submit the current BIO if the physical
blocks are not contigous OR the logical blocks are not contiguous.

Signed-off-by: Josef Bacik <josef@redhat.com>
---

V1->V2
-Be more verbose in the in-code comment

 fs/direct-io.c |   20 ++++++++++++++++++--
 1 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index 5949947..da111aa 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -632,10 +632,26 @@ static int dio_send_cur_page(struct dio *dio)
 	int ret = 0;
 
 	if (dio->bio) {
+		loff_t cur_offset = dio->block_in_file << dio->blkbits;
+		loff_t bio_next_offset = dio->logical_offset_in_bio +
+			dio->bio->bi_size;
+
 		/*
-		 * See whether this new request is contiguous with the old
+		 * See whether this new request is contiguous with the old.
+		 *
+		 * Btrfs cannot handl having logically non-contiguous requests
+		 * submitted.  For exmple if you have
+		 *
+		 * Logical:  [0-4095][HOLE][8192-12287]
+		 * Phyiscal: [0-4095]      [4096-8181]
+		 *
+		 * We cannot submit those pages together as one BIO.  So if our
+		 * current logical offset in the file does not equal what would
+		 * be the next logical offset in the bio, submit the bio we
+		 * have.
 		 */
-		if (dio->final_block_in_bio != dio->cur_page_block)
+		if (dio->final_block_in_bio != dio->cur_page_block ||
+		    cur_offset != bio_next_offset)
 			dio_bio_submit(dio);
 		/*
 		 * Submit now if the underlying fs is about to perform a
-- 
1.6.6.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/6] fs: kill blockdev_direct_IO_no_locking
  2010-05-21 17:03 [PATCH 1/6] fs: allow short direct-io reads to be completed via buffered IO Josef Bacik
  2010-05-21 17:03 ` [PATCH 2/6] direct-io: add a hook for the fs to provide its own submit_bio function Josef Bacik
  2010-05-21 17:03 ` [PATCH 3/6] direct-io: do not merge logically non-contiguous requests Josef Bacik
@ 2010-05-21 17:03 ` Josef Bacik
  2010-05-21 17:03 ` [PATCH 5/6] Btrfs: add basic DIO read/write support Josef Bacik
  2010-05-21 17:03 ` [PATCH 6/6] Btrfs: do aio_write instead of write Josef Bacik
  4 siblings, 0 replies; 19+ messages in thread
From: Josef Bacik @ 2010-05-21 17:03 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel, linux-kernel, hch, akpm

Christoph said he'd rather everybody use __blockdev_direct_IO directly instead
of having a bunch of random helper functions, so thats what this patch does.
It's a basic change, I've tested it with xfstests on ext4 and xfs.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/block_dev.c              |    5 +++--
 fs/ext4/inode.c             |    8 ++++----
 fs/gfs2/aops.c              |    6 +++---
 fs/ocfs2/aops.c             |    8 +++-----
 fs/xfs/linux-2.6/xfs_aops.c |    7 +++----
 include/linux/fs.h          |    9 ---------
 6 files changed, 16 insertions(+), 27 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 6dcee88..0f42cbc 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -172,8 +172,9 @@ blkdev_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file->f_mapping->host;
 
-	return blockdev_direct_IO_no_locking(rw, iocb, inode, I_BDEV(inode),
-				iov, offset, nr_segs, blkdev_get_blocks, NULL);
+	return __blockdev_direct_IO(rw, iocb, inode, I_BDEV(inode), iov,
+				    offset, nr_segs, blkdev_get_blocks, NULL,
+				    NULL, 0);
 }
 
 int __sync_blockdev(struct block_device *bdev, int wait)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 81d6054..8f37762 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3494,10 +3494,10 @@ static ssize_t ext4_ind_direct_IO(int rw, struct kiocb *iocb,
 
 retry:
 	if (rw == READ && ext4_should_dioread_nolock(inode))
-		ret = blockdev_direct_IO_no_locking(rw, iocb, inode,
-				 inode->i_sb->s_bdev, iov,
-				 offset, nr_segs,
-				 ext4_get_block, NULL);
+		ret = __blockdev_direct_IO(rw, iocb, inode,
+					   inode->i_sb->s_bdev, iov, offset,
+					   nr_segs, ext4_get_block, NULL, NULL,
+					   0);
 	else
 		ret = blockdev_direct_IO(rw, iocb, inode,
 				 inode->i_sb->s_bdev, iov,
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 0c1d0b8..45b23b0 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -1039,9 +1039,9 @@ static ssize_t gfs2_direct_IO(int rw, struct kiocb *iocb,
 	if (rv != 1)
 		goto out; /* dio not valid, fall back to buffered i/o */
 
-	rv = blockdev_direct_IO_no_locking(rw, iocb, inode, inode->i_sb->s_bdev,
-					   iov, offset, nr_segs,
-					   gfs2_get_block_direct, NULL);
+	rv = __blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev,
+				  iov, offset, nr_segs, gfs2_get_block_direct,
+				  NULL, NULL, 0);
 out:
 	gfs2_glock_dq_m(1, &gh);
 	gfs2_holder_uninit(&gh);
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 21441dd..f2e53a9 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -669,11 +669,9 @@ static ssize_t ocfs2_direct_IO(int rw,
 	if (i_size_read(inode) <= offset)
 		return 0;
 
-	ret = blockdev_direct_IO_no_locking(rw, iocb, inode,
-					    inode->i_sb->s_bdev, iov, offset,
-					    nr_segs,
-					    ocfs2_direct_IO_get_blocks,
-					    ocfs2_dio_end_io);
+	ret = __blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov,
+				   offset, nr_segs, ocfs2_direct_IO_get_blocks,
+				   ocfs2_dio_end_io, NULL, 0);
 
 	mlog_exit(ret);
 	return ret;
diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c
index 0f8b996..fcc14d8 100644
--- a/fs/xfs/linux-2.6/xfs_aops.c
+++ b/fs/xfs/linux-2.6/xfs_aops.c
@@ -1617,10 +1617,9 @@ xfs_vm_direct_IO(
 	iocb->private = xfs_alloc_ioend(inode, rw == WRITE ?
 					IOMAP_UNWRITTEN : IOMAP_READ);
 
-	ret = blockdev_direct_IO_no_locking(rw, iocb, inode, bdev, iov,
-					    offset, nr_segs,
-					    xfs_get_blocks_direct,
-					    xfs_end_io_direct);
+	ret = __blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset, nr_segs,
+				   xfs_get_blocks_direct, xfs_end_io_direct,
+				   NULL, 0);
 
 	if (unlikely(ret != -EIOCBQUEUED && iocb->private))
 		xfs_destroy_ioend(iocb->private);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f5dc079..5b7a8bd 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2277,15 +2277,6 @@ static inline ssize_t blockdev_direct_IO(int rw, struct kiocb *iocb,
 				    nr_segs, get_block, end_io, NULL,
 				    DIO_LOCKING | DIO_SKIP_HOLES);
 }
-
-static inline ssize_t blockdev_direct_IO_no_locking(int rw, struct kiocb *iocb,
-	struct inode *inode, struct block_device *bdev, const struct iovec *iov,
-	loff_t offset, unsigned long nr_segs, get_block_t get_block,
-	dio_iodone_t end_io)
-{
-	return __blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset,
-				    nr_segs, get_block, end_io, NULL, 0);
-}
 #endif
 
 extern const struct file_operations generic_ro_fops;
-- 
1.6.6.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/6] Btrfs: add basic DIO read/write support
  2010-05-21 17:03 [PATCH 1/6] fs: allow short direct-io reads to be completed via buffered IO Josef Bacik
                   ` (2 preceding siblings ...)
  2010-05-21 17:03 ` [PATCH 4/6] fs: kill blockdev_direct_IO_no_locking Josef Bacik
@ 2010-05-21 17:03 ` Josef Bacik
  2010-05-21 17:03 ` [PATCH 6/6] Btrfs: do aio_write instead of write Josef Bacik
  4 siblings, 0 replies; 19+ messages in thread
From: Josef Bacik @ 2010-05-21 17:03 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel, linux-kernel, hch, akpm

This provides basic DIO support for reading and writing.  It does not do the
work to recover from mismatching checksums, that will come later.  A few design
changes have been made from Jim's code (sorry Jim!)

1) Use the generic direct-io code.  Jim originally re-wrote all the generic DIO
code in order to account for all of BTRFS's oddities, but thanks to that work it
seems like the best bet is to just ignore compression and such and just opt to
fallback on buffered IO.

2) Fallback on buffered IO for compressed or inline extents.  Jim's code did
it's own buffering to make dio with compressed extents work.  Now we just
fallback onto normal buffered IO.

3) Use ordered extents for the writes so that all of the

lock_extent()
lookup_ordered()

type checks continue to work.

4) Do the lock_extent() lookup_ordered() loop in readpage so we don't race with
DIO writes.

I've tested this with fsx and everything works great.  This patch depends on my
dio and filemap.c patches to work.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
---
 fs/btrfs/ctree.h        |    2 +
 fs/btrfs/file-item.c    |   25 ++-
 fs/btrfs/file.c         |   69 +++++++-
 fs/btrfs/inode.c        |  486 ++++++++++++++++++++++++++++++++++++++++++++---
 fs/btrfs/ordered-data.c |   75 +++++++-
 fs/btrfs/ordered-data.h |    9 +-
 6 files changed, 630 insertions(+), 36 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 746a724..36994e8 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2257,6 +2257,8 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans,
 		    struct btrfs_root *root, u64 bytenr, u64 len);
 int btrfs_lookup_bio_sums(struct btrfs_root *root, struct inode *inode,
 			  struct bio *bio, u32 *dst);
+int btrfs_lookup_bio_sums_dio(struct btrfs_root *root, struct inode *inode,
+			      struct bio *bio, u64 logical_offset, u32 *dst);
 int btrfs_insert_file_extent(struct btrfs_trans_handle *trans,
 			     struct btrfs_root *root,
 			     u64 objectid, u64 pos,
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 54a2550..34ea718 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -149,13 +149,14 @@ int btrfs_lookup_file_extent(struct btrfs_trans_handle *trans,
 }
 
 
-int btrfs_lookup_bio_sums(struct btrfs_root *root, struct inode *inode,
-			  struct bio *bio, u32 *dst)
+static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
+				   struct inode *inode, struct bio *bio,
+				   u64 logical_offset, u32 *dst, int dio)
 {
 	u32 sum;
 	struct bio_vec *bvec = bio->bi_io_vec;
 	int bio_index = 0;
-	u64 offset;
+	u64 offset = 0;
 	u64 item_start_offset = 0;
 	u64 item_last_offset = 0;
 	u64 disk_bytenr;
@@ -174,8 +175,11 @@ int btrfs_lookup_bio_sums(struct btrfs_root *root, struct inode *inode,
 	WARN_ON(bio->bi_vcnt <= 0);
 
 	disk_bytenr = (u64)bio->bi_sector << 9;
+	if (dio)
+		offset = logical_offset;
 	while (bio_index < bio->bi_vcnt) {
-		offset = page_offset(bvec->bv_page) + bvec->bv_offset;
+		if (!dio)
+			offset = page_offset(bvec->bv_page) + bvec->bv_offset;
 		ret = btrfs_find_ordered_sum(inode, offset, disk_bytenr, &sum);
 		if (ret == 0)
 			goto found;
@@ -238,6 +242,7 @@ found:
 		else
 			set_state_private(io_tree, offset, sum);
 		disk_bytenr += bvec->bv_len;
+		offset += bvec->bv_len;
 		bio_index++;
 		bvec++;
 	}
@@ -245,6 +250,18 @@ found:
 	return 0;
 }
 
+int btrfs_lookup_bio_sums(struct btrfs_root *root, struct inode *inode,
+			  struct bio *bio, u32 *dst)
+{
+	return __btrfs_lookup_bio_sums(root, inode, bio, 0, dst, 0);
+}
+
+int btrfs_lookup_bio_sums_dio(struct btrfs_root *root, struct inode *inode,
+			      struct bio *bio, u64 offset, u32 *dst)
+{
+	return __btrfs_lookup_bio_sums(root, inode, bio, offset, dst, 1);
+}
+
 int btrfs_lookup_csums_range(struct btrfs_root *root, u64 start, u64 end,
 			     struct list_head *list)
 {
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 29ff749..dace07b 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -823,6 +823,47 @@ again:
 	return 0;
 }
 
+/* Copied from read-write.c */
+static void wait_on_retry_sync_kiocb(struct kiocb *iocb)
+{
+	set_current_state(TASK_UNINTERRUPTIBLE);
+	if (!kiocbIsKicked(iocb))
+		schedule();
+	else
+		kiocbClearKicked(iocb);
+	__set_current_state(TASK_RUNNING);
+}
+
+/*
+ * Just a copy of what do_sync_write does.
+ */
+static ssize_t __btrfs_direct_write(struct file *file, const char __user *buf,
+				    size_t count, loff_t pos, loff_t *ppos)
+{
+	struct iovec iov = { .iov_base = (void __user *)buf, .iov_len = count };
+	unsigned long nr_segs = 1;
+	struct kiocb kiocb;
+	ssize_t ret;
+
+	init_sync_kiocb(&kiocb, file);
+	kiocb.ki_pos = pos;
+	kiocb.ki_left = count;
+	kiocb.ki_nbytes = count;
+
+	while (1) {
+		ret = generic_file_direct_write(&kiocb, &iov, &nr_segs, pos,
+						ppos, count, count);
+		if (ret != -EIOCBRETRY)
+			break;
+		wait_on_retry_sync_kiocb(&kiocb);
+	}
+
+	if (ret == -EIOCBQUEUED)
+		ret = wait_on_sync_kiocb(&kiocb);
+	*ppos = kiocb.ki_pos;
+	return ret;
+}
+
 static ssize_t btrfs_file_write(struct file *file, const char __user *buf,
 				size_t count, loff_t *ppos)
 {
@@ -839,12 +880,11 @@ static ssize_t btrfs_file_write(struct file *file, const char __user *buf,
 	unsigned long first_index;
 	unsigned long last_index;
 	int will_write;
+	int buffered = 0;
 
 	will_write = ((file->f_flags & O_DSYNC) || IS_SYNC(inode) ||
 		      (file->f_flags & O_DIRECT));
 
-	nrptrs = min((count + PAGE_CACHE_SIZE - 1) / PAGE_CACHE_SIZE,
-		     PAGE_CACHE_SIZE / (sizeof(struct page *)));
 	pinned[0] = NULL;
 	pinned[1] = NULL;
 
@@ -875,13 +915,34 @@ static ssize_t btrfs_file_write(struct file *file, const char __user *buf,
 		goto out;
 
 	file_update_time(file);
+	BTRFS_I(inode)->sequence++;
+
+	if (unlikely(file->f_flags & O_DIRECT)) {
+		num_written = __btrfs_direct_write(file, buf, count, pos,
+						   ppos);
+		pos += num_written;
+		count -= num_written;
+
+		/* We've written everything we wanted to, exit */
+		if (num_written < 0 || !count)
+			goto out;
 
+		/*
+		 * We are going to do buffered for the rest of the range, so we
+		 * need to make sure to invalidate the buffered pages when we're
+		 * done.
+		 */
+		buffered = 1;
+		buf += num_written;
+	}
+
+	nrptrs = min((count + PAGE_CACHE_SIZE - 1) / PAGE_CACHE_SIZE,
+		     PAGE_CACHE_SIZE / (sizeof(struct page *)));
 	pages = kmalloc(nrptrs * sizeof(struct page *), GFP_KERNEL);
 
 	/* generic_write_checks can change our pos */
 	start_pos = pos;
 
-	BTRFS_I(inode)->sequence++;
 	first_index = pos >> PAGE_CACHE_SHIFT;
 	last_index = (pos + count) >> PAGE_CACHE_SHIFT;
 
@@ -1023,7 +1084,7 @@ out_nolock:
 				btrfs_end_transaction(trans, root);
 			}
 		}
-		if (file->f_flags & O_DIRECT) {
+		if (file->f_flags & O_DIRECT && buffered) {
 			invalidate_mapping_pages(inode->i_mapping,
 			      start_pos >> PAGE_CACHE_SHIFT,
 			     (start_pos + num_written - 1) >> PAGE_CACHE_SHIFT);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2bfdc64..89a9455 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -697,6 +697,38 @@ retry:
 	return 0;
 }
 
+static u64 get_extent_allocation_hint(struct inode *inode, u64 start,
+				      u64 num_bytes)
+{
+	struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree;
+	struct extent_map *em;
+	u64 alloc_hint = 0;
+
+	read_lock(&em_tree->lock);
+	em = search_extent_mapping(em_tree, start, num_bytes);
+	if (em) {
+		/*
+		 * if block start isn't an actual block number then find the
+		 * first block in this inode and use that as a hint.  If that
+		 * block is also bogus then just don't worry about it.
+		 */
+		if (em->block_start >= EXTENT_MAP_LAST_BYTE) {
+			free_extent_map(em);
+			em = search_extent_mapping(em_tree, 0, 0);
+			if (em && em->block_start < EXTENT_MAP_LAST_BYTE)
+				alloc_hint = em->block_start;
+			if (em)
+				free_extent_map(em);
+		} else {
+			alloc_hint = em->block_start;
+			free_extent_map(em);
+		}
+	}
+	read_unlock(&em_tree->lock);
+
+	return alloc_hint;
+}
+
 /*
  * when extent_io.c finds a delayed allocation range in the file,
  * the call backs end up in this code.  The basic idea is to
@@ -769,29 +801,7 @@ static noinline int cow_file_range(struct inode *inode,
 	BUG_ON(disk_num_bytes >
 	       btrfs_super_total_bytes(&root->fs_info->super_copy));
 
-
-	read_lock(&BTRFS_I(inode)->extent_tree.lock);
-	em = search_extent_mapping(&BTRFS_I(inode)->extent_tree,
-				   start, num_bytes);
-	if (em) {
-		/*
-		 * if block start isn't an actual block number then find the
-		 * first block in this inode and use that as a hint.  If that
-		 * block is also bogus then just don't worry about it.
-		 */
-		if (em->block_start >= EXTENT_MAP_LAST_BYTE) {
-			free_extent_map(em);
-			em = search_extent_mapping(em_tree, 0, 0);
-			if (em && em->block_start < EXTENT_MAP_LAST_BYTE)
-				alloc_hint = em->block_start;
-			if (em)
-				free_extent_map(em);
-		} else {
-			alloc_hint = em->block_start;
-			free_extent_map(em);
-		}
-	}
-	read_unlock(&BTRFS_I(inode)->extent_tree.lock);
+	alloc_hint = get_extent_allocation_hint(inode, start, num_bytes);
 	btrfs_drop_extent_cache(inode, start, start + num_bytes - 1, 0);
 
 	while (disk_num_bytes > 0) {
@@ -4875,11 +4885,439 @@ out:
 	return em;
 }
 
+static struct extent_map *btrfs_new_extent_direct(struct inode *inode,
+						  u64 start, u64 len)
+{
+	struct btrfs_root *root = BTRFS_I(inode)->root;
+	struct btrfs_trans_handle *trans;
+	struct extent_map *em;
+	struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree;
+	struct btrfs_key ins;
+	u64 alloc_hint;
+	int ret;
+
+	btrfs_drop_extent_cache(inode, start, start + len - 1, 0);
+
+	trans = btrfs_start_transaction(root, 1);
+	if (!trans)
+		return ERR_PTR(-ENOMEM);
+
+	alloc_hint = get_extent_allocation_hint(inode, start, len);
+	ret = btrfs_reserve_extent(trans, root, len, root->sectorsize, 0,
+				   alloc_hint, (u64)-1, &ins, 1);
+	if (ret) {
+		em = ERR_PTR(ret);
+		goto out;
+	}
+
+	em = alloc_extent_map(GFP_NOFS);
+	if (!em) {
+		em = ERR_PTR(-ENOMEM);
+		goto out;
+	}
+
+	em->start = start;
+	em->orig_start = em->start;
+	em->len = ins.offset;
+
+	em->block_start = ins.objectid;
+	em->block_len = ins.offset;
+	em->bdev = root->fs_info->fs_devices->latest_bdev;
+	set_bit(EXTENT_FLAG_PINNED, &em->flags);
+
+	while (1) {
+		write_lock(&em_tree->lock);
+		ret = add_extent_mapping(em_tree, em);
+		write_unlock(&em_tree->lock);
+		if (ret != -EEXIST)
+			break;
+		btrfs_drop_extent_cache(inode, start, start + em->len - 1, 0);
+	}
+
+	ret = btrfs_add_ordered_extent_dio(inode, start, ins.objectid,
+					   ins.offset, ins.offset, 0);
+	if (ret) {
+		btrfs_free_reserved_extent(root, ins.objectid, ins.offset);
+		em = ERR_PTR(ret);
+	}
+out:
+	btrfs_end_transaction(trans, root);
+	return em;
+}
+
+static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
+				   struct buffer_head *bh_result, int create)
+{
+	struct extent_map *em;
+	struct btrfs_root *root = BTRFS_I(inode)->root;
+	u64 start = iblock << inode->i_blkbits;
+	u64 len = bh_result->b_size;
+
+	em = btrfs_get_extent(inode, NULL, 0, start, len, 0);
+	if (IS_ERR(em))
+		return PTR_ERR(em);
+
+	/*
+	 * Ok for INLINE and COMPRESSED extents we need to fallback on buffered
+	 * io.  INLINE is special, and we could probably kludge it in here, but
+	 * it's still buffered so for safety lets just fall back to the generic
+	 * buffered path.
+	 *
+	 * For COMPRESSED we _have_ to read the entire extent in so we can
+	 * decompress it, so there will be buffering required no matter what we
+	 * do, so go ahead and fallback to buffered.
+	 *
+	 * We return -ENOTBLK because thats what makes DIO go ahead and go back
+	 * to buffered IO.  Don't blame me, this is the price we pay for using
+	 * the generic code.
+	 */
+	if (test_bit(EXTENT_FLAG_COMPRESSED, &em->flags) ||
+	    em->block_start == EXTENT_MAP_INLINE) {
+		free_extent_map(em);
+		return -ENOTBLK;
+	}
+
+	/* Just a good old fashioned hole, return */
+	if (!create && (em->block_start == EXTENT_MAP_HOLE ||
+			test_bit(EXTENT_FLAG_PREALLOC, &em->flags))) {
+		free_extent_map(em);
+		/* DIO will do one hole at a time, so just unlock a sector */
+		unlock_extent(&BTRFS_I(inode)->io_tree, start,
+			      start + root->sectorsize - 1, GFP_NOFS);
+		return 0;
+	}
+
+	/*
+	 * We don't allocate a new extent in the following cases
+	 *
+	 * 1) The inode is marked as NODATACOW.  In this case we'll just use the
+	 * existing extent.
+	 * 2) The extent is marked as PREALLOC.  We're good to go here and can
+	 * just use the extent.
+	 *
+	 */
+	if (!create)
+		goto map;
+
+	if (test_bit(EXTENT_FLAG_PREALLOC, &em->flags) ||
+	    ((BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW) &&
+	     em->block_start != EXTENT_MAP_HOLE)) {
+		u64 block_start;
+		int type;
+		int ret;
+
+		if (test_bit(EXTENT_FLAG_PREALLOC, &em->flags))
+			type = BTRFS_ORDERED_PREALLOC;
+		else
+			type = BTRFS_ORDERED_NOCOW;
+		len = min(len, em->block_len - (start - em->start));
+		block_start = em->block_start + (start - em->start);
+		ret = btrfs_add_ordered_extent_dio(inode, start,
+						   start, len, len, type);
+		if (ret) {
+			free_extent_map(em);
+			return ret;
+		}
+	} else {
+		free_extent_map(em);
+		em = btrfs_new_extent_direct(inode, start, len);
+		if (IS_ERR(em))
+			return PTR_ERR(em);
+		len = min(len, em->block_len);
+	}
+	/* We always just use one extent worth of information */
+	spin_lock(&BTRFS_I(inode)->accounting_lock);
+	BTRFS_I(inode)->outstanding_extents++;
+	spin_unlock(&BTRFS_I(inode)->accounting_lock);
+	unlock_extent(&BTRFS_I(inode)->io_tree, start, start + len - 1,
+		      GFP_NOFS);
+map:
+	bh_result->b_blocknr = (em->block_start + (start - em->start)) >>
+		inode->i_blkbits;
+	bh_result->b_size = em->len - (start - em->start);
+	bh_result->b_bdev = em->bdev;
+	set_buffer_mapped(bh_result);
+	if (create && !test_bit(EXTENT_FLAG_PREALLOC, &em->flags))
+		set_buffer_new(bh_result);
+
+	free_extent_map(em);
+
+	return 0;
+}
+
+struct btrfs_dio_private {
+	struct inode *inode;
+	u64 logical_offset;
+	u64 disk_bytenr;
+	u64 bytes;
+	u32 *csums;
+	void *private;
+};
+
+static void btrfs_endio_direct_read(struct bio *bio, int err)
+{
+	struct bio_vec *bvec_end = bio->bi_io_vec + bio->bi_vcnt - 1;
+	struct bio_vec *bvec = bio->bi_io_vec;
+	struct btrfs_dio_private *dip = bio->bi_private;
+	struct inode *inode = dip->inode;
+	struct btrfs_root *root = BTRFS_I(inode)->root;
+	u64 start;
+	u32 *private = dip->csums;
+
+	start = dip->logical_offset;
+	do {
+		if (!(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)) {
+			struct page *page = bvec->bv_page;
+			char *kaddr;
+			u32 csum = ~(u32)0;
+			unsigned long flags;
+
+			local_irq_save(flags);
+			kaddr = kmap_atomic(page, KM_IRQ0);
+			csum = btrfs_csum_data(root, kaddr + bvec->bv_offset,
+					       csum, bvec->bv_len);
+			btrfs_csum_final(csum, (char *)&csum);
+			kunmap_atomic(kaddr, KM_IRQ0);
+			local_irq_restore(flags);
+
+			flush_dcache_page(bvec->bv_page);
+			if (csum != *private) {
+				printk(KERN_ERR "btrfs csum failed ino %lu off"
+				      " %llu csum %u private %u\n",
+				      inode->i_ino, (unsigned long long)start,
+				      csum, *private);
+				err = -EIO;
+			}
+		}
+
+		start += bvec->bv_len;
+		private++;
+		bvec++;
+	} while (bvec <= bvec_end);
+
+	unlock_extent(&BTRFS_I(inode)->io_tree, dip->logical_offset,
+		      dip->logical_offset + dip->bytes - 1, GFP_NOFS);
+	bio->bi_private = dip->private;
+
+	kfree(dip->csums);
+	kfree(dip);
+	dio_end_io(bio, err);
+}
+
+static void btrfs_endio_direct_write(struct bio *bio, int err)
+{
+	struct btrfs_dio_private *dip = bio->bi_private;
+	struct inode *inode = dip->inode;
+	struct btrfs_root *root = BTRFS_I(inode)->root;
+	struct btrfs_trans_handle *trans;
+	struct btrfs_ordered_extent *ordered = NULL;
+	struct extent_state *cached_state = NULL;
+	int ret;
+
+	if (err)
+		goto out_done;
+
+	ret = btrfs_dec_test_ordered_pending(inode, &ordered,
+					     dip->logical_offset, dip->bytes);
+	if (!ret)
+		goto out_done;
+	BUG_ON(!ordered);
+
+	trans = btrfs_join_transaction(root, 1);
+	if (!trans) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	if (test_bit(BTRFS_ORDERED_NOCOW, &ordered->flags)) {
+		ret = btrfs_ordered_update_i_size(inode, 0, ordered);
+		if (!ret)
+			ret = btrfs_update_inode(trans, root, inode);
+		err = ret;
+		goto out;
+	}
+
+	lock_extent_bits(&BTRFS_I(inode)->io_tree, ordered->file_offset,
+			 ordered->file_offset + ordered->len - 1, 0,
+			 &cached_state, GFP_NOFS);
+
+	if (test_bit(BTRFS_ORDERED_PREALLOC, &ordered->flags)) {
+		ret = btrfs_mark_extent_written(trans, inode,
+						ordered->file_offset,
+						ordered->file_offset +
+						ordered->len);
+		if (ret) {
+			err = ret;
+			goto out_unlock;
+		}
+	} else {
+		ret = insert_reserved_file_extent(trans, inode,
+						  ordered->file_offset,
+						  ordered->start,
+						  ordered->disk_len,
+						  ordered->len,
+						  ordered->len,
+						  0, 0, 0,
+						  BTRFS_FILE_EXTENT_REG);
+		unpin_extent_cache(&BTRFS_I(inode)->extent_tree,
+				   ordered->file_offset, ordered->len);
+		if (ret) {
+			err = ret;
+			WARN_ON(1);
+			goto out_unlock;
+		}
+	}
+
+	add_pending_csums(trans, inode, ordered->file_offset, &ordered->list);
+	btrfs_ordered_update_i_size(inode, 0, ordered);
+	btrfs_update_inode(trans, root, inode);
+out_unlock:
+	unlock_extent_cached(&BTRFS_I(inode)->io_tree, ordered->file_offset,
+			     ordered->file_offset + ordered->len - 1,
+			     &cached_state, GFP_NOFS);
+out:
+	btrfs_end_transaction(trans, root);
+	btrfs_put_ordered_extent(ordered);
+	btrfs_put_ordered_extent(ordered);
+out_done:
+	bio->bi_private = dip->private;
+
+	kfree(dip->csums);
+	kfree(dip);
+	dio_end_io(bio, err);
+}
+
+static void btrfs_submit_direct(int rw, struct bio *bio, struct inode *inode,
+				loff_t file_offset)
+{
+	struct btrfs_root *root = BTRFS_I(inode)->root;
+	struct btrfs_dio_private *dip;
+	struct bio_vec *bvec = bio->bi_io_vec;
+	u64 start;
+	int skip_sum;
+	int write = rw & (1 << BIO_RW);
+	int ret = 0;
+
+	skip_sum = BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM;
+
+	dip = kmalloc(sizeof(*dip), GFP_NOFS);
+	if (!dip) {
+		ret = -ENOMEM;
+		goto free_ordered;
+	}
+	dip->csums = NULL;
+
+	if (!skip_sum) {
+		dip->csums = kmalloc(sizeof(u32) * bio->bi_vcnt, GFP_NOFS);
+		if (!dip->csums) {
+			ret = -ENOMEM;
+			goto free_ordered;
+		}
+	}
+
+	dip->private = bio->bi_private;
+	dip->inode = inode;
+	dip->logical_offset = file_offset;
+
+	start = dip->logical_offset;
+	dip->bytes = 0;
+	do {
+		dip->bytes += bvec->bv_len;
+		bvec++;
+	} while (bvec <= (bio->bi_io_vec + bio->bi_vcnt - 1));
+
+	dip->disk_bytenr = bio->bi_sector << 9;
+	bio->bi_private = dip;
+
+	if (write)
+		bio->bi_end_io = btrfs_endio_direct_write;
+	else
+		bio->bi_end_io = btrfs_endio_direct_read;
+
+	ret = btrfs_bio_wq_end_io(root->fs_info, bio, 0);
+	if (ret)
+		goto out_err;
+
+	if (write && !skip_sum)
+		btrfs_csum_one_bio(root, inode, bio, dip->logical_offset, 1);
+	else if (!skip_sum)
+		btrfs_lookup_bio_sums_dio(root, inode, bio,
+					  dip->logical_offset, dip->csums);
+
+	ret = btrfs_map_bio(root, rw, bio, 0, 0);
+	if (ret)
+		goto out_err;
+	return;
+out_err:
+	kfree(dip->csums);
+	kfree(dip);
+free_ordered:
+	/*
+	 * If this is a write, we need to clean up the reserved space and kill
+	 * the ordered extent.
+	 */
+	if (write) {
+		struct btrfs_ordered_extent *ordered;
+		ordered = btrfs_lookup_ordered_extent(inode,
+						      dip->logical_offset);
+		if (!test_bit(BTRFS_ORDERED_PREALLOC, &ordered->flags) &&
+		    !test_bit(BTRFS_ORDERED_NOCOW, &ordered->flags))
+			btrfs_free_reserved_extent(root, ordered->start,
+						   ordered->disk_len);
+		btrfs_put_ordered_extent(ordered);
+		btrfs_put_ordered_extent(ordered);
+	}
+	bio_endio(bio, ret);
+}
+
 static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb,
 			const struct iovec *iov, loff_t offset,
 			unsigned long nr_segs)
 {
-	return -EINVAL;
+	struct file *file = iocb->ki_filp;
+	struct inode *inode = file->f_mapping->host;
+	struct btrfs_ordered_extent *ordered;
+	u64 lockstart, lockend;
+	ssize_t ret;
+
+	lockstart = offset;
+	lockend = offset + iov_length(iov, nr_segs) - 1;
+	while (1) {
+		lock_extent(&BTRFS_I(inode)->io_tree, lockstart, lockend,
+			    GFP_NOFS);
+		/*
+		 * We're concerned with the entire range that we're going to be
+		 * doing DIO to, so we need to make sure theres no ordered
+		 * extents in this range.
+		 */
+		ordered = btrfs_lookup_ordered_range(inode, lockstart,
+						     lockend - lockstart + 1);
+		if (!ordered)
+			break;
+		unlock_extent(&BTRFS_I(inode)->io_tree, lockstart, lockend,
+			      GFP_NOFS);
+		btrfs_start_ordered_extent(inode, ordered, 1);
+		btrfs_put_ordered_extent(ordered);
+		cond_resched();
+	}
+
+	ret = __blockdev_direct_IO(rw, iocb, inode, NULL, iov, offset, nr_segs,
+				   btrfs_get_blocks_direct, NULL,
+				   btrfs_submit_direct, 0);
+
+	if (ret < 0 && ret != -EIOCBQUEUED) {
+		unlock_extent(&BTRFS_I(inode)->io_tree, offset,
+			      offset + iov_length(iov, nr_segs) - 1, GFP_NOFS);
+	} else if (ret >= 0 && ret < iov_length(iov, nr_segs)) {
+		/*
+		 * We're falling back to buffered, unlock the section we didn't
+		 * do IO on.
+		 */
+		unlock_extent(&BTRFS_I(inode)->io_tree, offset + ret,
+			      offset + iov_length(iov, nr_segs) - 1, GFP_NOFS);
+	}
+
+	return ret;
 }
 
 static int btrfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index a127c0e..7a06e97 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -124,6 +124,15 @@ static int offset_in_entry(struct btrfs_ordered_extent *entry, u64 file_offset)
 	return 1;
 }
 
+static int range_overlaps(struct btrfs_ordered_extent *entry, u64 file_offset,
+			  u64 len)
+{
+	if (file_offset + len <= entry->file_offset ||
+	    entry->file_offset + entry->len <= file_offset)
+		return 0;
+	return 1;
+}
+
 /*
  * look find the first ordered struct that has this offset, otherwise
  * the first one less than this offset
@@ -161,8 +170,9 @@ static inline struct rb_node *tree_search(struct btrfs_ordered_inode_tree *tree,
  * The tree is given a single reference on the ordered extent that was
  * inserted.
  */
-int btrfs_add_ordered_extent(struct inode *inode, u64 file_offset,
-			     u64 start, u64 len, u64 disk_len, int type)
+static int __btrfs_add_ordered_extent(struct inode *inode, u64 file_offset,
+				      u64 start, u64 len, u64 disk_len,
+				      int type, int dio)
 {
 	struct btrfs_ordered_inode_tree *tree;
 	struct rb_node *node;
@@ -182,6 +192,9 @@ int btrfs_add_ordered_extent(struct inode *inode, u64 file_offset,
 	if (type != BTRFS_ORDERED_IO_DONE && type != BTRFS_ORDERED_COMPLETE)
 		set_bit(type, &entry->flags);
 
+	if (dio)
+		set_bit(BTRFS_ORDERED_DIRECT, &entry->flags);
+
 	/* one ref for the tree */
 	atomic_set(&entry->refs, 1);
 	init_waitqueue_head(&entry->wait);
@@ -203,6 +216,20 @@ int btrfs_add_ordered_extent(struct inode *inode, u64 file_offset,
 	return 0;
 }
 
+int btrfs_add_ordered_extent(struct inode *inode, u64 file_offset,
+			     u64 start, u64 len, u64 disk_len, int type)
+{
+	return __btrfs_add_ordered_extent(inode, file_offset, start, len,
+					  disk_len, type, 0);
+}
+
+int btrfs_add_ordered_extent_dio(struct inode *inode, u64 file_offset,
+				 u64 start, u64 len, u64 disk_len, int type)
+{
+	return __btrfs_add_ordered_extent(inode, file_offset, start, len,
+					  disk_len, type, 1);
+}
+
 /*
  * Add a struct btrfs_ordered_sum into the list of checksums to be inserted
  * when an ordered extent is finished.  If the list covers more than one
@@ -491,7 +518,8 @@ void btrfs_start_ordered_extent(struct inode *inode,
 	 * start IO on any dirty ones so the wait doesn't stall waiting
 	 * for pdflush to find them
 	 */
-	filemap_fdatawrite_range(inode->i_mapping, start, end);
+	if (!test_bit(BTRFS_ORDERED_DIRECT, &entry->flags))
+		filemap_fdatawrite_range(inode->i_mapping, start, end);
 	if (wait) {
 		wait_event(entry->wait, test_bit(BTRFS_ORDERED_COMPLETE,
 						 &entry->flags));
@@ -588,6 +616,47 @@ out:
 	return entry;
 }
 
+/* Since the DIO code tries to lock a wide area we need to look for any ordered
+ * extents that exist in the range, rather than just the start of the range.
+ */
+struct btrfs_ordered_extent *btrfs_lookup_ordered_range(struct inode *inode,
+							u64 file_offset,
+							u64 len)
+{
+	struct btrfs_ordered_inode_tree *tree;
+	struct rb_node *node;
+	struct btrfs_ordered_extent *entry = NULL;
+
+	tree = &BTRFS_I(inode)->ordered_tree;
+	spin_lock(&tree->lock);
+	node = tree_search(tree, file_offset);
+	if (!node) {
+		node = tree_search(tree, file_offset + len);
+		if (!node)
+			goto out;
+	}
+
+	while (1) {
+		entry = rb_entry(node, struct btrfs_ordered_extent, rb_node);
+		if (range_overlaps(entry, file_offset, len))
+			break;
+
+		if (entry->file_offset >= file_offset + len) {
+			entry = NULL;
+			break;
+		}
+		entry = NULL;
+		node = rb_next(node);
+		if (!node)
+			break;
+	}
+out:
+	if (entry)
+		atomic_inc(&entry->refs);
+	spin_unlock(&tree->lock);
+	return entry;
+}
+
 /*
  * lookup and return any extent before 'file_offset'.  NULL is returned
  * if none is found
diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index c82f76a..8ac3654 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -72,6 +72,8 @@ struct btrfs_ordered_sum {
 
 #define BTRFS_ORDERED_PREALLOC 4 /* set when writing to prealloced extent */
 
+#define BTRFS_ORDERED_DIRECT 5 /* set when we're doing DIO with this extent */
+
 struct btrfs_ordered_extent {
 	/* logical offset in the file */
 	u64 file_offset;
@@ -140,7 +142,9 @@ int btrfs_dec_test_ordered_pending(struct inode *inode,
 				   struct btrfs_ordered_extent **cached,
 				   u64 file_offset, u64 io_size);
 int btrfs_add_ordered_extent(struct inode *inode, u64 file_offset,
-			     u64 start, u64 len, u64 disk_len, int tyep);
+			     u64 start, u64 len, u64 disk_len, int type);
+int btrfs_add_ordered_extent_dio(struct inode *inode, u64 file_offset,
+				 u64 start, u64 len, u64 disk_len, int type);
 int btrfs_add_ordered_sum(struct inode *inode,
 			  struct btrfs_ordered_extent *entry,
 			  struct btrfs_ordered_sum *sum);
@@ -151,6 +155,9 @@ void btrfs_start_ordered_extent(struct inode *inode,
 int btrfs_wait_ordered_range(struct inode *inode, u64 start, u64 len);
 struct btrfs_ordered_extent *
 btrfs_lookup_first_ordered_extent(struct inode * inode, u64 file_offset);
+struct btrfs_ordered_extent *btrfs_lookup_ordered_range(struct inode *inode,
+							u64 file_offset,
+							u64 len);
 int btrfs_ordered_update_i_size(struct inode *inode, u64 offset,
 				struct btrfs_ordered_extent *ordered);
 int btrfs_find_ordered_sum(struct inode *inode, u64 offset, u64 disk_bytenr, u32 *sum);
-- 
1.6.6.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/6] Btrfs: do aio_write instead of write
  2010-05-21 17:03 [PATCH 1/6] fs: allow short direct-io reads to be completed via buffered IO Josef Bacik
                   ` (3 preceding siblings ...)
  2010-05-21 17:03 ` [PATCH 5/6] Btrfs: add basic DIO read/write support Josef Bacik
@ 2010-05-21 17:03 ` Josef Bacik
  2010-05-23  8:31   ` Shi Weihua
                     ` (2 more replies)
  4 siblings, 3 replies; 19+ messages in thread
From: Josef Bacik @ 2010-05-21 17:03 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel, linux-kernel, hch, akpm

In order for AIO to work, we need to implement aio_write.  This patch converts
our btrfs_file_write to btrfs_aio_write.  I've tested this with xfstests and
nothing broke, and the AIO stuff magically started working.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
---
 fs/btrfs/extent_io.c |   11 +++-
 fs/btrfs/file.c      |  152 +++++++++++++++++++++++---------------------------
 2 files changed, 80 insertions(+), 83 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d2d0368..c407f1c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2020,6 +2020,7 @@ static int __extent_read_full_page(struct extent_io_tree *tree,
 	sector_t sector;
 	struct extent_map *em;
 	struct block_device *bdev;
+	struct btrfs_ordered_extent *ordered;
 	int ret;
 	int nr = 0;
 	size_t page_offset = 0;
@@ -2031,7 +2032,15 @@ static int __extent_read_full_page(struct extent_io_tree *tree,
 	set_page_extent_mapped(page);
 
 	end = page_end;
-	lock_extent(tree, start, end, GFP_NOFS);
+	while (1) {
+		lock_extent(tree, start, end, GFP_NOFS);
+		ordered = btrfs_lookup_ordered_extent(inode, start);
+		if (!ordered)
+			break;
+		unlock_extent(tree, start, end, GFP_NOFS);
+		btrfs_start_ordered_extent(inode, ordered, 1);
+		btrfs_put_ordered_extent(ordered);
+	}
 
 	if (page->index == last_byte >> PAGE_CACHE_SHIFT) {
 		char *userpage;
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index dace07b..ce35431 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -46,32 +46,42 @@
 static noinline int btrfs_copy_from_user(loff_t pos, int num_pages,
 					 int write_bytes,
 					 struct page **prepared_pages,
-					 const char __user *buf)
+					 struct iov_iter *i)
 {
-	long page_fault = 0;
-	int i;
+	size_t copied;
+	int pg = 0;
 	int offset = pos & (PAGE_CACHE_SIZE - 1);
 
-	for (i = 0; i < num_pages && write_bytes > 0; i++, offset = 0) {
+	while (write_bytes > 0) {
 		size_t count = min_t(size_t,
 				     PAGE_CACHE_SIZE - offset, write_bytes);
-		struct page *page = prepared_pages[i];
-		fault_in_pages_readable(buf, count);
+		struct page *page = prepared_pages[pg];
+again:
+		if (unlikely(iov_iter_fault_in_readable(i, count)))
+			return -EFAULT;
 
 		/* Copy data from userspace to the current page */
-		kmap(page);
-		page_fault = __copy_from_user(page_address(page) + offset,
-					      buf, count);
+		copied = iov_iter_copy_from_user(page, i, offset, count);
+
 		/* Flush processor's dcache for this page */
 		flush_dcache_page(page);
-		kunmap(page);
-		buf += count;
-		write_bytes -= count;
+		iov_iter_advance(i, copied);
+		write_bytes -= copied;
 
-		if (page_fault)
-			break;
+		if (unlikely(copied == 0)) {
+			count = min_t(size_t, PAGE_CACHE_SIZE - offset,
+				      iov_iter_single_seg_count(i));
+			goto again;
+		}
+
+		if (unlikely(copied < PAGE_CACHE_SIZE - offset)) {
+			offset += copied;
+		} else {
+			pg++;
+			offset = 0;
+		}
 	}
-	return page_fault ? -EFAULT : 0;
+	return 0;
 }
 
 /*
@@ -823,60 +833,24 @@ again:
 	return 0;
 }
 
-/* Copied from read-write.c */
-static void wait_on_retry_sync_kiocb(struct kiocb *iocb)
-{
-	set_current_state(TASK_UNINTERRUPTIBLE);
-	if (!kiocbIsKicked(iocb))
-		schedule();
-	else
-		kiocbClearKicked(iocb);
-	__set_current_state(TASK_RUNNING);
-}
-
-/*
- * Just a copy of what do_sync_write does.
- */
-static ssize_t __btrfs_direct_write(struct file *file, const char __user *buf,
-				    size_t count, loff_t pos, loff_t *ppos)
-{
-	struct iovec iov = { .iov_base = (void __user *)buf, .iov_len = count };
-	unsigned long nr_segs = 1;
-	struct kiocb kiocb;
-	ssize_t ret;
-
-	init_sync_kiocb(&kiocb, file);
-	kiocb.ki_pos = pos;
-	kiocb.ki_left = count;
-	kiocb.ki_nbytes = count;
-
-	while (1) {
-		ret = generic_file_direct_write(&kiocb, &iov, &nr_segs, pos,
-						ppos, count, count);
-		if (ret != -EIOCBRETRY)
-			break;
-		wait_on_retry_sync_kiocb(&kiocb);
-	}
-
-	if (ret == -EIOCBQUEUED)
-		ret = wait_on_sync_kiocb(&kiocb);
-	*ppos = kiocb.ki_pos;
-	return ret;
-}
-
-static ssize_t btrfs_file_write(struct file *file, const char __user *buf,
-				size_t count, loff_t *ppos)
+static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
+				    const struct iovec *iov,
+				    unsigned long nr_segs, loff_t pos)
 {
-	loff_t pos;
+	struct file *file = iocb->ki_filp;
+	struct inode *inode = fdentry(file)->d_inode;
+	struct btrfs_root *root = BTRFS_I(inode)->root;
+	struct page *pinned[2];
+	struct page **pages = NULL;
+	struct iov_iter i;
+	loff_t *ppos = &iocb->ki_pos;
 	loff_t start_pos;
 	ssize_t num_written = 0;
 	ssize_t err = 0;
+	size_t count;
+	size_t ocount;
 	int ret = 0;
-	struct inode *inode = fdentry(file)->d_inode;
-	struct btrfs_root *root = BTRFS_I(inode)->root;
-	struct page **pages = NULL;
 	int nrptrs;
-	struct page *pinned[2];
 	unsigned long first_index;
 	unsigned long last_index;
 	int will_write;
@@ -888,7 +862,6 @@ static ssize_t btrfs_file_write(struct file *file, const char __user *buf,
 	pinned[0] = NULL;
 	pinned[1] = NULL;
 
-	pos = *ppos;
 	start_pos = pos;
 
 	vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
@@ -902,6 +875,11 @@ static ssize_t btrfs_file_write(struct file *file, const char __user *buf,
 
 	mutex_lock(&inode->i_mutex);
 
+	err = generic_segment_checks(iov, &nr_segs, &ocount, VERIFY_READ);
+	if (err)
+		goto out;
+	count = ocount;
+
 	current->backing_dev_info = inode->i_mapping->backing_dev_info;
 	err = generic_write_checks(file, &pos, &count, S_ISBLK(inode->i_mode));
 	if (err)
@@ -918,14 +896,24 @@ static ssize_t btrfs_file_write(struct file *file, const char __user *buf,
 	BTRFS_I(inode)->sequence++;
 
 	if (unlikely(file->f_flags & O_DIRECT)) {
-		num_written = __btrfs_direct_write(file, buf, count, pos,
-						   ppos);
-		pos += num_written;
-		count -= num_written;
+		ret = btrfs_check_data_free_space(root, inode, count);
+		if (ret)
+			goto out;
 
-		/* We've written everything we wanted to, exit */
-		if (num_written < 0 || !count)
+		num_written = generic_file_direct_write(iocb, iov, &nr_segs,
+							pos, ppos, count,
+							ocount);
+
+		/* All reservations for DIO are done internally */
+		btrfs_free_reserved_data_space(root, inode, count);
+
+		if (num_written < 0) {
+			ret = num_written;
+			num_written = 0;
+			goto out;
+		} else if (num_written == count) {
 			goto out;
+		}
 
 		/*
 		 * We are going to do buffered for the rest of the range, so we
@@ -933,18 +921,20 @@ static ssize_t btrfs_file_write(struct file *file, const char __user *buf,
 		 * done.
 		 */
 		buffered = 1;
-		buf += num_written;
+		pos += num_written;
 	}
 
-	nrptrs = min((count + PAGE_CACHE_SIZE - 1) / PAGE_CACHE_SIZE,
-		     PAGE_CACHE_SIZE / (sizeof(struct page *)));
+	iov_iter_init(&i, iov, nr_segs, count, num_written);
+	nrptrs = min((iov_iter_count(&i) + PAGE_CACHE_SIZE - 1) /
+		     PAGE_CACHE_SIZE, PAGE_CACHE_SIZE /
+		     (sizeof(struct page *)));
 	pages = kmalloc(nrptrs * sizeof(struct page *), GFP_KERNEL);
 
 	/* generic_write_checks can change our pos */
 	start_pos = pos;
 
 	first_index = pos >> PAGE_CACHE_SHIFT;
-	last_index = (pos + count) >> PAGE_CACHE_SHIFT;
+	last_index = (pos + iov_iter_count(&i)) >> PAGE_CACHE_SHIFT;
 
 	/*
 	 * there are lots of better ways to do this, but this code
@@ -961,7 +951,7 @@ static ssize_t btrfs_file_write(struct file *file, const char __user *buf,
 			unlock_page(pinned[0]);
 		}
 	}
-	if ((pos + count) & (PAGE_CACHE_SIZE - 1)) {
+	if ((pos + iov_iter_count(&i)) & (PAGE_CACHE_SIZE - 1)) {
 		pinned[1] = grab_cache_page(inode->i_mapping, last_index);
 		if (!PageUptodate(pinned[1])) {
 			ret = btrfs_readpage(NULL, pinned[1]);
@@ -972,10 +962,10 @@ static ssize_t btrfs_file_write(struct file *file, const char __user *buf,
 		}
 	}
 
-	while (count > 0) {
+	while (iov_iter_count(&i) > 0) {
 		size_t offset = pos & (PAGE_CACHE_SIZE - 1);
-		size_t write_bytes = min(count, nrptrs *
-					(size_t)PAGE_CACHE_SIZE -
+		size_t write_bytes = min(iov_iter_count(&i),
+					 nrptrs * (size_t)PAGE_CACHE_SIZE -
 					 offset);
 		size_t num_pages = (write_bytes + PAGE_CACHE_SIZE - 1) >>
 					PAGE_CACHE_SHIFT;
@@ -997,7 +987,7 @@ static ssize_t btrfs_file_write(struct file *file, const char __user *buf,
 		}
 
 		ret = btrfs_copy_from_user(pos, num_pages,
-					   write_bytes, pages, buf);
+					   write_bytes, pages, &i);
 		if (ret) {
 			btrfs_free_reserved_data_space(root, inode,
 						       write_bytes);
@@ -1026,8 +1016,6 @@ static ssize_t btrfs_file_write(struct file *file, const char __user *buf,
 			btrfs_throttle(root);
 		}
 
-		buf += write_bytes;
-		count -= write_bytes;
 		pos += write_bytes;
 		num_written += write_bytes;
 
@@ -1222,7 +1210,7 @@ const struct file_operations btrfs_file_operations = {
 	.read		= do_sync_read,
 	.aio_read       = generic_file_aio_read,
 	.splice_read	= generic_file_splice_read,
-	.write		= btrfs_file_write,
+	.aio_write	= btrfs_file_aio_write,
 	.mmap		= btrfs_file_mmap,
 	.open		= generic_file_open,
 	.release	= btrfs_release_file,
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/6] direct-io: do not merge logically non-contiguous requests
  2010-05-21 17:03 ` [PATCH 3/6] direct-io: do not merge logically non-contiguous requests Josef Bacik
@ 2010-05-22  1:47   ` Mike Fedyk
  2010-05-22 14:02     ` Josef Bacik
  0 siblings, 1 reply; 19+ messages in thread
From: Mike Fedyk @ 2010-05-22  1:47 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, linux-fsdevel, linux-kernel, hch, akpm

On Fri, May 21, 2010 at 10:03 AM, Josef Bacik <josef@redhat.com> wrote:
> Btrfs cannot handle having logically non-contiguous requests submitte=
d. =C2=A0For
> example if you have
>
> Logical: =C2=A0[0-4095][HOLE][8192-12287]
> Physical: [0-4095] =C2=A0 =C2=A0 =C2=A0[4096-8191]
>
> Normally the DIO code would put these into the same BIO's. =C2=A0The =
problem is we
> need to know exactly what offset is associated with what BIO so we ca=
n do our
> checksumming and unlocking properly, so putting them in the same BIO =
doesn't
> work. =C2=A0So add another check where we submit the current BIO if t=
he physical
> blocks are not contigous OR the logical blocks are not contiguous.
>
> Signed-off-by: Josef Bacik <josef@redhat.com>
> ---
>
> V1->V2
> -Be more verbose in the in-code comment
>
> =C2=A0fs/direct-io.c | =C2=A0 20 ++++++++++++++++++--
> =C2=A01 files changed, 18 insertions(+), 2 deletions(-)
>

Btrfs has been pretty much self-contained (working well compiled
against 2.6.32 for example).  Is there a way that this wouldn't just
start silently breaking for people compiling the latest btrfs with
dkms against older kernels?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/6] direct-io: do not merge logically non-contiguous requests
  2010-05-22  1:47   ` Mike Fedyk
@ 2010-05-22 14:02     ` Josef Bacik
  0 siblings, 0 replies; 19+ messages in thread
From: Josef Bacik @ 2010-05-22 14:02 UTC (permalink / raw)
  To: Mike Fedyk
  Cc: Josef Bacik, linux-btrfs, linux-fsdevel, linux-kernel, hch, akpm

On Fri, May 21, 2010 at 06:47:36PM -0700, Mike Fedyk wrote:
> On Fri, May 21, 2010 at 10:03 AM, Josef Bacik <josef@redhat.com> wrot=
e:
> > Btrfs cannot handle having logically non-contiguous requests submit=
ted. =A0For
> > example if you have
> >
> > Logical: =A0[0-4095][HOLE][8192-12287]
> > Physical: [0-4095] =A0 =A0 =A0[4096-8191]
> >
> > Normally the DIO code would put these into the same BIO's. =A0The p=
roblem is we
> > need to know exactly what offset is associated with what BIO so we =
can do our
> > checksumming and unlocking properly, so putting them in the same BI=
O doesn't
> > work. =A0So add another check where we submit the current BIO if th=
e physical
> > blocks are not contigous OR the logical blocks are not contiguous.
> >
> > Signed-off-by: Josef Bacik <josef@redhat.com>
> > ---
> >
> > V1->V2
> > -Be more verbose in the in-code comment
> >
> > =A0fs/direct-io.c | =A0 20 ++++++++++++++++++--
> > =A01 files changed, 18 insertions(+), 2 deletions(-)
> >
>=20
> Btrfs has been pretty much self-contained (working well compiled
> against 2.6.32 for example).  Is there a way that this wouldn't just
> start silently breaking for people compiling the latest btrfs with
> dkms against older kernels?

Nope, you have to have these generic patches for DIO to work, so buildi=
ng btrfs
like this will stop working with earlier kernels.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel=
" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] Btrfs: do aio_write instead of write
  2010-05-21 17:03 ` [PATCH 6/6] Btrfs: do aio_write instead of write Josef Bacik
@ 2010-05-23  8:31   ` Shi Weihua
  2010-05-23 23:59     ` Josef Bacik
  2010-05-27  3:06   ` liubo
  2010-05-30  9:33   ` Dmitri Nikulin
  2 siblings, 1 reply; 19+ messages in thread
From: Shi Weihua @ 2010-05-23  8:31 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, linux-fsdevel, linux-kernel, hch, akpm

at 2010-5-22 1:03, Josef Bacik wrote:
> In order for AIO to work, we need to implement aio_write.  This patch converts
> our btrfs_file_write to btrfs_aio_write.  I've tested this with xfstests and
> nothing broke, and the AIO stuff magically started working.  Thanks,

But xfstests's case 198(source: src/aio-dio-regress/aiodio_sparse2.c) still failed, 
following message outputted.
--------------------
AIO write offset 0 expected 65536 got -22
AIO write offset 5242880 expected 65536 got -22
AIO write offset 10485760 expected 65536 got -22
AIO write offset 15728640 expected 65536 got -22
AIO write offset 20971520 expected 65536 got -22
AIO write offset 31457280 expected 65536 got -22
AIO write offset 36700160 expected 65536 got -22
AIO write offset 41943040 expected 65536 got -22
AIO write offset 47185920 expected 65536 got -22
AIO write offset 52428800 expected 65536 got -22
AIO write offset 57671680 expected 65536 got -22
AIO write offset 62914560 expected 65536 got -22
AIO write offset 73400320 expected 65536 got -22
AIO write offset 78643200 expected 65536 got -22
non one buffer at buf[0] => 0x00,00,00,00
non-one read at offset 0
*** WARNING *** /tmp/aaaa has not been unlinked; if you don't rm it manually first, it may influence the next run
-------------------- 

generic_file_direct_write()(in btrfs_file_aio_write(), fs/btrfs/file.c) returned -22, 
maybe it's useful for your analysing.

Thanks.

> 
> Signed-off-by: Josef Bacik <josef@redhat.com>
> ---
>  fs/btrfs/extent_io.c |   11 +++-
>  fs/btrfs/file.c      |  152 +++++++++++++++++++++++---------------------------
>  2 files changed, 80 insertions(+), 83 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index d2d0368..c407f1c 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2020,6 +2020,7 @@ static int __extent_read_full_page(struct extent_io_tree *tree,
>  	sector_t sector;
>  	struct extent_map *em;
>  	struct block_device *bdev;
> +	struct btrfs_ordered_extent *ordered;
>  	int ret;
>  	int nr = 0;
>  	size_t page_offset = 0;
> @@ -2031,7 +2032,15 @@ static int __extent_read_full_page(struct extent_io_tree *tree,
>  	set_page_extent_mapped(page);
>  
>  	end = page_end;
> -	lock_extent(tree, start, end, GFP_NOFS);
> +	while (1) {
> +		lock_extent(tree, start, end, GFP_NOFS);
> +		ordered = btrfs_lookup_ordered_extent(inode, start);
> +		if (!ordered)
> +			break;
> +		unlock_extent(tree, start, end, GFP_NOFS);
> +		btrfs_start_ordered_extent(inode, ordered, 1);
> +		btrfs_put_ordered_extent(ordered);
> +	}
>  
>  	if (page->index == last_byte >> PAGE_CACHE_SHIFT) {
>  		char *userpage;
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index dace07b..ce35431 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -46,32 +46,42 @@
>  static noinline int btrfs_copy_from_user(loff_t pos, int num_pages,
>  					 int write_bytes,
>  					 struct page **prepared_pages,
> -					 const char __user *buf)
> +					 struct iov_iter *i)
>  {
> -	long page_fault = 0;
> -	int i;
> +	size_t copied;
> +	int pg = 0;
>  	int offset = pos & (PAGE_CACHE_SIZE - 1);
>  
> -	for (i = 0; i < num_pages && write_bytes > 0; i++, offset = 0) {
> +	while (write_bytes > 0) {
>  		size_t count = min_t(size_t,
>  				     PAGE_CACHE_SIZE - offset, write_bytes);
> -		struct page *page = prepared_pages[i];
> -		fault_in_pages_readable(buf, count);
> +		struct page *page = prepared_pages[pg];
> +again:
> +		if (unlikely(iov_iter_fault_in_readable(i, count)))
> +			return -EFAULT;
>  
>  		/* Copy data from userspace to the current page */
> -		kmap(page);
> -		page_fault = __copy_from_user(page_address(page) + offset,
> -					      buf, count);
> +		copied = iov_iter_copy_from_user(page, i, offset, count);
> +
>  		/* Flush processor's dcache for this page */
>  		flush_dcache_page(page);
> -		kunmap(page);
> -		buf += count;
> -		write_bytes -= count;
> +		iov_iter_advance(i, copied);
> +		write_bytes -= copied;
>  
> -		if (page_fault)
> -			break;
> +		if (unlikely(copied == 0)) {
> +			count = min_t(size_t, PAGE_CACHE_SIZE - offset,
> +				      iov_iter_single_seg_count(i));
> +			goto again;
> +		}
> +
> +		if (unlikely(copied < PAGE_CACHE_SIZE - offset)) {
> +			offset += copied;
> +		} else {
> +			pg++;
> +			offset = 0;
> +		}
>  	}
> -	return page_fault ? -EFAULT : 0;
> +	return 0;
>  }
>  
>  /*
> @@ -823,60 +833,24 @@ again:
>  	return 0;
>  }
>  
> -/* Copied from read-write.c */
> -static void wait_on_retry_sync_kiocb(struct kiocb *iocb)
> -{
> -	set_current_state(TASK_UNINTERRUPTIBLE);
> -	if (!kiocbIsKicked(iocb))
> -		schedule();
> -	else
> -		kiocbClearKicked(iocb);
> -	__set_current_state(TASK_RUNNING);
> -}
> -
> -/*
> - * Just a copy of what do_sync_write does.
> - */
> -static ssize_t __btrfs_direct_write(struct file *file, const char __user *buf,
> -				    size_t count, loff_t pos, loff_t *ppos)
> -{
> -	struct iovec iov = { .iov_base = (void __user *)buf, .iov_len = count };
> -	unsigned long nr_segs = 1;
> -	struct kiocb kiocb;
> -	ssize_t ret;
> -
> -	init_sync_kiocb(&kiocb, file);
> -	kiocb.ki_pos = pos;
> -	kiocb.ki_left = count;
> -	kiocb.ki_nbytes = count;
> -
> -	while (1) {
> -		ret = generic_file_direct_write(&kiocb, &iov, &nr_segs, pos,
> -						ppos, count, count);
> -		if (ret != -EIOCBRETRY)
> -			break;
> -		wait_on_retry_sync_kiocb(&kiocb);
> -	}
> -
> -	if (ret == -EIOCBQUEUED)
> -		ret = wait_on_sync_kiocb(&kiocb);
> -	*ppos = kiocb.ki_pos;
> -	return ret;
> -}
> -
> -static ssize_t btrfs_file_write(struct file *file, const char __user *buf,
> -				size_t count, loff_t *ppos)
> +static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
> +				    const struct iovec *iov,
> +				    unsigned long nr_segs, loff_t pos)
>  {
> -	loff_t pos;
> +	struct file *file = iocb->ki_filp;
> +	struct inode *inode = fdentry(file)->d_inode;
> +	struct btrfs_root *root = BTRFS_I(inode)->root;
> +	struct page *pinned[2];
> +	struct page **pages = NULL;
> +	struct iov_iter i;
> +	loff_t *ppos = &iocb->ki_pos;
>  	loff_t start_pos;
>  	ssize_t num_written = 0;
>  	ssize_t err = 0;
> +	size_t count;
> +	size_t ocount;
>  	int ret = 0;
> -	struct inode *inode = fdentry(file)->d_inode;
> -	struct btrfs_root *root = BTRFS_I(inode)->root;
> -	struct page **pages = NULL;
>  	int nrptrs;
> -	struct page *pinned[2];
>  	unsigned long first_index;
>  	unsigned long last_index;
>  	int will_write;
> @@ -888,7 +862,6 @@ static ssize_t btrfs_file_write(struct file *file, const char __user *buf,
>  	pinned[0] = NULL;
>  	pinned[1] = NULL;
>  
> -	pos = *ppos;
>  	start_pos = pos;
>  
>  	vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
> @@ -902,6 +875,11 @@ static ssize_t btrfs_file_write(struct file *file, const char __user *buf,
>  
>  	mutex_lock(&inode->i_mutex);
>  
> +	err = generic_segment_checks(iov, &nr_segs, &ocount, VERIFY_READ);
> +	if (err)
> +		goto out;
> +	count = ocount;
> +
>  	current->backing_dev_info = inode->i_mapping->backing_dev_info;
>  	err = generic_write_checks(file, &pos, &count, S_ISBLK(inode->i_mode));
>  	if (err)
> @@ -918,14 +896,24 @@ static ssize_t btrfs_file_write(struct file *file, const char __user *buf,
>  	BTRFS_I(inode)->sequence++;
>  
>  	if (unlikely(file->f_flags & O_DIRECT)) {
> -		num_written = __btrfs_direct_write(file, buf, count, pos,
> -						   ppos);
> -		pos += num_written;
> -		count -= num_written;
> +		ret = btrfs_check_data_free_space(root, inode, count);
> +		if (ret)
> +			goto out;
>  
> -		/* We've written everything we wanted to, exit */
> -		if (num_written < 0 || !count)
> +		num_written = generic_file_direct_write(iocb, iov, &nr_segs,
> +							pos, ppos, count,
> +							ocount);
> +
> +		/* All reservations for DIO are done internally */
> +		btrfs_free_reserved_data_space(root, inode, count);
> +
> +		if (num_written < 0) {
> +			ret = num_written;
> +			num_written = 0;
> +			goto out;
> +		} else if (num_written == count) {
>  			goto out;
> +		}
>  
>  		/*
>  		 * We are going to do buffered for the rest of the range, so we
> @@ -933,18 +921,20 @@ static ssize_t btrfs_file_write(struct file *file, const char __user *buf,
>  		 * done.
>  		 */
>  		buffered = 1;
> -		buf += num_written;
> +		pos += num_written;
>  	}
>  
> -	nrptrs = min((count + PAGE_CACHE_SIZE - 1) / PAGE_CACHE_SIZE,
> -		     PAGE_CACHE_SIZE / (sizeof(struct page *)));
> +	iov_iter_init(&i, iov, nr_segs, count, num_written);
> +	nrptrs = min((iov_iter_count(&i) + PAGE_CACHE_SIZE - 1) /
> +		     PAGE_CACHE_SIZE, PAGE_CACHE_SIZE /
> +		     (sizeof(struct page *)));
>  	pages = kmalloc(nrptrs * sizeof(struct page *), GFP_KERNEL);
>  
>  	/* generic_write_checks can change our pos */
>  	start_pos = pos;
>  
>  	first_index = pos >> PAGE_CACHE_SHIFT;
> -	last_index = (pos + count) >> PAGE_CACHE_SHIFT;
> +	last_index = (pos + iov_iter_count(&i)) >> PAGE_CACHE_SHIFT;
>  
>  	/*
>  	 * there are lots of better ways to do this, but this code
> @@ -961,7 +951,7 @@ static ssize_t btrfs_file_write(struct file *file, const char __user *buf,
>  			unlock_page(pinned[0]);
>  		}
>  	}
> -	if ((pos + count) & (PAGE_CACHE_SIZE - 1)) {
> +	if ((pos + iov_iter_count(&i)) & (PAGE_CACHE_SIZE - 1)) {
>  		pinned[1] = grab_cache_page(inode->i_mapping, last_index);
>  		if (!PageUptodate(pinned[1])) {
>  			ret = btrfs_readpage(NULL, pinned[1]);
> @@ -972,10 +962,10 @@ static ssize_t btrfs_file_write(struct file *file, const char __user *buf,
>  		}
>  	}
>  
> -	while (count > 0) {
> +	while (iov_iter_count(&i) > 0) {
>  		size_t offset = pos & (PAGE_CACHE_SIZE - 1);
> -		size_t write_bytes = min(count, nrptrs *
> -					(size_t)PAGE_CACHE_SIZE -
> +		size_t write_bytes = min(iov_iter_count(&i),
> +					 nrptrs * (size_t)PAGE_CACHE_SIZE -
>  					 offset);
>  		size_t num_pages = (write_bytes + PAGE_CACHE_SIZE - 1) >>
>  					PAGE_CACHE_SHIFT;
> @@ -997,7 +987,7 @@ static ssize_t btrfs_file_write(struct file *file, const char __user *buf,
>  		}
>  
>  		ret = btrfs_copy_from_user(pos, num_pages,
> -					   write_bytes, pages, buf);
> +					   write_bytes, pages, &i);
>  		if (ret) {
>  			btrfs_free_reserved_data_space(root, inode,
>  						       write_bytes);
> @@ -1026,8 +1016,6 @@ static ssize_t btrfs_file_write(struct file *file, const char __user *buf,
>  			btrfs_throttle(root);
>  		}
>  
> -		buf += write_bytes;
> -		count -= write_bytes;
>  		pos += write_bytes;
>  		num_written += write_bytes;
>  
> @@ -1222,7 +1210,7 @@ const struct file_operations btrfs_file_operations = {
>  	.read		= do_sync_read,
>  	.aio_read       = generic_file_aio_read,
>  	.splice_read	= generic_file_splice_read,
> -	.write		= btrfs_file_write,
> +	.aio_write	= btrfs_file_aio_write,
>  	.mmap		= btrfs_file_mmap,
>  	.open		= generic_file_open,
>  	.release	= btrfs_release_file,

-- 
Shi Weihua

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] Btrfs: do aio_write instead of write
  2010-05-23  8:31   ` Shi Weihua
@ 2010-05-23 23:59     ` Josef Bacik
  0 siblings, 0 replies; 19+ messages in thread
From: Josef Bacik @ 2010-05-23 23:59 UTC (permalink / raw)
  To: Shi Weihua
  Cc: Josef Bacik, linux-btrfs, linux-fsdevel, linux-kernel, hch, akpm

On Sun, May 23, 2010 at 04:31:52PM +0800, Shi Weihua wrote:
> at 2010-5-22 1:03, Josef Bacik wrote:
> > In order for AIO to work, we need to implement aio_write.  This patch converts
> > our btrfs_file_write to btrfs_aio_write.  I've tested this with xfstests and
> > nothing broke, and the AIO stuff magically started working.  Thanks,
> 
> But xfstests's case 198(source: src/aio-dio-regress/aiodio_sparse2.c) still failed, 
> following message outputted.
> --------------------
> AIO write offset 0 expected 65536 got -22
> AIO write offset 5242880 expected 65536 got -22
> AIO write offset 10485760 expected 65536 got -22
> AIO write offset 15728640 expected 65536 got -22
> AIO write offset 20971520 expected 65536 got -22
> AIO write offset 31457280 expected 65536 got -22
> AIO write offset 36700160 expected 65536 got -22
> AIO write offset 41943040 expected 65536 got -22
> AIO write offset 47185920 expected 65536 got -22
> AIO write offset 52428800 expected 65536 got -22
> AIO write offset 57671680 expected 65536 got -22
> AIO write offset 62914560 expected 65536 got -22
> AIO write offset 73400320 expected 65536 got -22
> AIO write offset 78643200 expected 65536 got -22
> non one buffer at buf[0] => 0x00,00,00,00
> non-one read at offset 0
> *** WARNING *** /tmp/aaaa has not been unlinked; if you don't rm it manually first, it may influence the next run
> -------------------- 
> 
> generic_file_direct_write()(in btrfs_file_aio_write(), fs/btrfs/file.c) returned -22, 
> maybe it's useful for your analysing.

Yes, change that testcase to run -a 4096 and it will run fine.  Because BTRFS
doesn't pass in a bdev to __blockdev_direct_IO it doesn't do 512 byte aligned
IO, just blocksize aligned IO.  I will fix that at some later point, but its a
little tricky since we have to figure out which bdev has the largest alignment
(in case we have a 4k sector device and a 512 byte sector device in the same
volume).  Thanks,

Josef

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] Btrfs: do aio_write instead of write
  2010-05-21 17:03 ` [PATCH 6/6] Btrfs: do aio_write instead of write Josef Bacik
  2010-05-23  8:31   ` Shi Weihua
@ 2010-05-27  3:06   ` liubo
  2010-05-27  3:13     ` liubo
  2010-05-27 12:59     ` Chris Mason
  2010-05-30  9:33   ` Dmitri Nikulin
  2 siblings, 2 replies; 19+ messages in thread
From: liubo @ 2010-05-27  3:06 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, linux-fsdevel, linux-kernel

On 05/22/2010 01:03 AM, Josef Bacik wrote:
> In order for AIO to work, we need to implement aio_write.  This patch converts
> our btrfs_file_write to btrfs_aio_write.  I've tested this with xfstests and
> nothing broke, and the AIO stuff magically started working.  Thanks,
>
> Signed-off-by: Josef Bacik <josef@redhat.com>
>   

Hi, Josef,

I've tested your patch(May 22) with my tools, and one case triggered a bug
which made writev operation hang up, more information is followed.

- Steps to trigger it:
  # mount /dev/sda8 /home/btrfsdisk -o nodatacow
  # gcc direct-io.c -o direct-io
  # ./direct-io O_DIRECT writev /home/btrfsdisk/testrw 4M

then on another tty, after "dmesg"...

[snip]
device fsid f44b0879c75c0e99-1d4b28f2d5c503ae devid 1 transid 11177
/dev/sda8
btrfs: setting nodatacow
INFO: task direct-io:1399 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
direct-io     D 0000000000000003     0  1399   1341 0x00000000
 ffff880137c379c8 0000000000000082 ffff880137c379d8 ffffffff00000000
 ffff880137c37fd8 ffff880139730000 0000000000015440 ffff880137c37fd8
 0000000000015440 0000000000015440 0000000000015440 0000000000015440
Call Trace:
 [<ffffffffa0119d4a>] wait_extent_bit+0xe3/0x163 [btrfs]
 [<ffffffff8106651f>] ? autoremove_wake_function+0x0/0x39
 [<ffffffffa0119e47>] lock_extent_bits+0x7d/0xa8 [btrfs]
 [<ffffffffa0119e88>] lock_extent+0x16/0x18 [btrfs]
 [<ffffffffa01025ce>] btrfs_direct_IO+0x8e/0x1be [btrfs]
 [<ffffffff810c7301>] generic_file_direct_write+0xed/0x16d
 [<ffffffffa010bb91>] btrfs_file_aio_write+0x2af/0x8d2 [btrfs]
 [<ffffffff81100eae>] ? try_get_mem_cgroup_from_mm+0x39/0x49
 [<ffffffffa010b8e2>] ? btrfs_file_aio_write+0x0/0x8d2 [btrfs]
 [<ffffffff811063ed>] do_sync_readv_writev+0xc1/0x100
 [<ffffffff81106120>] ? might_fault+0x21/0x23
 [<ffffffff81106151>] ? copy_from_user+0x2f/0x31
 [<ffffffff811c90ab>] ? security_file_permission+0x16/0x18
 [<ffffffff81107145>] do_readv_writev+0xa7/0x127
 [<ffffffff81107208>] vfs_writev+0x43/0x4e
 [<ffffffff811072f8>] sys_writev+0x4a/0x93
 [<ffffffff81009c32>] system_call_fastpath+0x16/0x1b


So, can you figure out if anything in your patch leads to the bug?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] Btrfs: do aio_write instead of write
  2010-05-27  3:06   ` liubo
@ 2010-05-27  3:13     ` liubo
  2010-05-27 12:59     ` Chris Mason
  1 sibling, 0 replies; 19+ messages in thread
From: liubo @ 2010-05-27  3:13 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, linux-fsdevel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2625 bytes --]

On 05/27/2010 11:06 AM, liubo wrote:
> On 05/22/2010 01:03 AM, Josef Bacik wrote:
>   
>> In order for AIO to work, we need to implement aio_write.  This patch converts
>> our btrfs_file_write to btrfs_aio_write.  I've tested this with xfstests and
>> nothing broke, and the AIO stuff magically started working.  Thanks,
>>
>> Signed-off-by: Josef Bacik <josef@redhat.com>
>>   
>>     
>
> Hi, Josef,
>
> I've tested your patch(May 22) with my tools, and one case triggered a bug
> which made writev operation hang up, more information is followed.
>
> - Steps to trigger it:
>   # mount /dev/sda8 /home/btrfsdisk -o nodatacow
>   # gcc direct-io.c -o direct-io
>   # ./direct-io O_DIRECT writev /home/btrfsdisk/testrw 4M
>
> then on another tty, after "dmesg"...
>
> [snip]
> device fsid f44b0879c75c0e99-1d4b28f2d5c503ae devid 1 transid 11177
> /dev/sda8
> btrfs: setting nodatacow
> INFO: task direct-io:1399 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> direct-io     D 0000000000000003     0  1399   1341 0x00000000
>  ffff880137c379c8 0000000000000082 ffff880137c379d8 ffffffff00000000
>  ffff880137c37fd8 ffff880139730000 0000000000015440 ffff880137c37fd8
>  0000000000015440 0000000000015440 0000000000015440 0000000000015440
> Call Trace:
>  [<ffffffffa0119d4a>] wait_extent_bit+0xe3/0x163 [btrfs]
>  [<ffffffff8106651f>] ? autoremove_wake_function+0x0/0x39
>  [<ffffffffa0119e47>] lock_extent_bits+0x7d/0xa8 [btrfs]
>  [<ffffffffa0119e88>] lock_extent+0x16/0x18 [btrfs]
>  [<ffffffffa01025ce>] btrfs_direct_IO+0x8e/0x1be [btrfs]
>  [<ffffffff810c7301>] generic_file_direct_write+0xed/0x16d
>  [<ffffffffa010bb91>] btrfs_file_aio_write+0x2af/0x8d2 [btrfs]
>  [<ffffffff81100eae>] ? try_get_mem_cgroup_from_mm+0x39/0x49
>  [<ffffffffa010b8e2>] ? btrfs_file_aio_write+0x0/0x8d2 [btrfs]
>  [<ffffffff811063ed>] do_sync_readv_writev+0xc1/0x100
>  [<ffffffff81106120>] ? might_fault+0x21/0x23
>  [<ffffffff81106151>] ? copy_from_user+0x2f/0x31
>  [<ffffffff811c90ab>] ? security_file_permission+0x16/0x18
>  [<ffffffff81107145>] do_readv_writev+0xa7/0x127
>  [<ffffffff81107208>] vfs_writev+0x43/0x4e
>  [<ffffffff811072f8>] sys_writev+0x4a/0x93
>  [<ffffffff81009c32>] system_call_fastpath+0x16/0x1b
>
>
> So, can you figure out if anything in your patch leads to the bug?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>   

Sorry, I forgot the attachment...

Plz get it.

Thanks,

- Liubo


[-- Attachment #2: direct-io.c --]
[-- Type: text/plain, Size: 6652 bytes --]

/******************************************************************************/
/*                                                                            */
/* Copyright (c) 2010 FUJITSU LIMITED                                         */
/*                                                                            */
/* This program is free software;  you can redistribute it and/or modify      */
/* it under the terms of the GNU General Public License as published by       */
/* the Free Software Foundation; either version 2 of the License, or          */
/* (at your option) any later version.                                        */
/*                                                                            */
/* This program is distributed in the hope that it will be useful,            */
/* but WITHOUT ANY WARRANTY;  without even the implied warranty of            */
/* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See                  */
/* the GNU General Public License for more details.                           */
/*                                                                            */
/* You should have received a copy of the GNU General Public License          */
/* along with this program;  if not, write to the Free Software               */
/* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA    */
/*                                                                            */
/* Author: Liu Bo <liubo2009@cn.fujitsu.com>                                  */
/*                                                                            */
/******************************************************************************/


#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/uio.h>
#include <fcntl.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <string.h>
#include <malloc.h>

#define BUFFERSIZE 8192

#ifndef O_DIRECT
#define O_DIRECT	00040000	/* direct disk access hint */
#endif

static int rw_test(char *type, int fd, int nullfd, int zerofd, char *size);


/* Direct-io Test 
 *
 * Return Value:
 * 	0: succeed
 * 	1: fail
 *
 * Parse command line arguments 
 * flag: 
 * 	O_DIRECT, O_DIRECT | O_SYNC
 * rwtype:
 * 	read, readv, write, writev
 * test file name:
 * 	testrw
 * test file size:
 * 	0 | 1K | 4K | 4M | 4G | 4G+1	
 */
int main(int argc, char *argv[])
{
	char *rwtype;
	char *testfile;
	char *size;
	int fd, nullfd, zerofd;
	int flag = O_RDWR;
	int ret;

	if (--argc != 4) {
		fprintf(stderr, "4 arguments are needed!\n");
		exit(1);
	}

	if (!strcmp(argv[1], "O_DIRECT"))
		flag |= O_DIRECT;
	else if (!strcmp(argv[1], "O_DIRECT | O_SYNC") ||
				!strcmp(argv[1], "O_DIRECT|O_SYNC"))
		flag |= O_DIRECT | O_SYNC;

	rwtype = argv[2];
	testfile = argv[3];
	size = argv[4];

	if ((fd = open(testfile, flag)) < 0) {
		fprintf(stderr, "Cannot open the test file! \
					: %s\n\n", strerror(errno));
		exit(1);
	}
	if ((nullfd = open("/dev/null", O_WRONLY)) < 0) {
		fprintf(stderr, "Cannot open the test file! \
					: %s\n\n", strerror(errno));
		exit(1);
	}
	if ((zerofd = open("/dev/zero", O_RDONLY)) < 0) {
		fprintf(stderr, "Cannot open the test file! \
					: %s\n\n", strerror(errno));
		exit(1);
	}

	ret = rw_test(rwtype, fd, nullfd, zerofd, size);

	close(fd);
	close(nullfd);
	close(zerofd);

	if (ret != 0)
		return 1;
	return 0;
}

/* 
 echo "=========================" >> $RESULTFILE
 * rw_test is used to test read/write/readv/writev.
 *
 * Arguments:
 * 	  type: read/write/readv/writev
 * 	    fd: read/write file descripter
 * 	nullfd: write file descripter binded to /dev/null
 *	zerofd: read file descripter binded to /dev/zero
 *	  size: 0 | 1K | 4K | 4M | 4G | 4G+1
 *
 * Return value:
 * 	 0: succeed
 * 	 1: write error
 * 	-1: read error
 */
static int rw_test(char *type, int fd, int nullfd, int zerofd, char *size)
{
	ssize_t n;
	int iovcnt;
	long limit = 0;

	if(!strcmp(size, "1K")) {
		limit = 1024;
	} else if (!strcmp(size, "4K")) {
		limit = 1024 * 4;
	} else if (!strcmp(size, "4M")) {
		limit = 1024 * 1024 * 4;
	} else if (!strcmp(size, "4G")) {
		limit = 1024 * 1024 * 4;
		limit *= 1024;
	} else if (!strcmp(size, "4G+1")) {
		limit = 1024 * 1024 * 4;
		limit *= 1024;
		limit += 1;
	}

	if (!strcmp(type, "read")) {
		char *buf;

		buf = valloc(BUFFERSIZE);
		if (!buf) {
			perror("valloc error");
			exit(1);
		}
		memset(buf, 0, BUFFERSIZE);

		while ((n = read(fd, buf, BUFFERSIZE)) > 0) {
			if (write(nullfd, buf, n) != n) {
				fprintf(stderr, "write error! :\
						 %s\n", strerror(errno));
				return 1;
			}
		}

		if (n < 0) {
			fprintf(stderr, "read error : %s\n\n", \
							strerror(errno));
			return -1;
		}

	} else if (!strcmp(type, "write")) {
		char *buf;
		long count = 0;

		buf = valloc(BUFFERSIZE);
		if (!buf) {
			perror("valloc error");
			exit(1);
		}
		memset(buf, 0, BUFFERSIZE);

		while ((n = read(zerofd, buf, BUFFERSIZE)) > 0) {
			if (write(fd, buf, n) != n) {
				fprintf(stderr, "write error! :\
						 %s\n", strerror(errno));
				return 1;
			}
			count += n;
			if (count >= limit)
				break;
		}

		if (count < limit) {
			fprintf(stderr, "read error : %s\n\n", \
							strerror(errno));
			return -1;
		}

	} else if (!strcmp(type, "readv")) {
		iovcnt = 2;
		char *buf[iovcnt];
		struct iovec iov[iovcnt];
		int i;

		for (i = 0; i < iovcnt; i++) {
			buf[i] = valloc(BUFFERSIZE);
			if (!buf[i]) {
				perror("valloc error");
				exit(1);
			}

			memset(buf[i], 0, BUFFERSIZE);

			iov[i].iov_base = buf[i];
			iov[i].iov_len = BUFFERSIZE;
		}

		while ((n = readv(fd, iov, iovcnt)) > 0) {
			if (writev(nullfd, iov, iovcnt) != 
					(iovcnt * BUFFERSIZE)) {
				fprintf(stderr, "writev error! :\
						 %s\n", strerror(errno));
				return 1;
			}
		}

		if (n < 0) {
			fprintf(stderr, "readv error : %s\n\n", \
							strerror(errno));
			return -1;
		}

	} else if (!strcmp(type, "writev")) {
		iovcnt = 2;
		char *buf[iovcnt];
		struct iovec iov[iovcnt];
		long count = 0;
		int i;

		for (i = 0; i < iovcnt; i++) {
			buf[i] = valloc(BUFFERSIZE);
			if (!buf[i]) {
				perror("valloc error");
				exit(1);
			}

			memset(buf[i], 0, BUFFERSIZE);

			iov[i].iov_base = buf[i];
			iov[i].iov_len = BUFFERSIZE;
		}

		while ((n = readv(zerofd, iov, iovcnt)) > 0) {
			if (writev(fd, iov, iovcnt) != 
					(iovcnt * BUFFERSIZE)) {
				fprintf(stderr, "writev error! :\
						 %s\n", strerror(errno));
				return 1;
			}
			count += n;
			if (count >= limit)
				break;
		}

		if (count < limit) {
			fprintf(stderr, "readv error : %s\n\n", \
							strerror(errno));
			return -1;
		}

	}

	return 0;
}

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] Btrfs: do aio_write instead of write
  2010-05-27  3:06   ` liubo
  2010-05-27  3:13     ` liubo
@ 2010-05-27 12:59     ` Chris Mason
  2010-05-28  1:42       ` liubo
  1 sibling, 1 reply; 19+ messages in thread
From: Chris Mason @ 2010-05-27 12:59 UTC (permalink / raw)
  To: liubo; +Cc: Josef Bacik, linux-btrfs, linux-fsdevel, linux-kernel

On Thu, May 27, 2010 at 11:06:54AM +0800, liubo wrote:
> On 05/22/2010 01:03 AM, Josef Bacik wrote:
> > In order for AIO to work, we need to implement aio_write.  This patch converts
> > our btrfs_file_write to btrfs_aio_write.  I've tested this with xfstests and
> > nothing broke, and the AIO stuff magically started working.  Thanks,
> >
> > Signed-off-by: Josef Bacik <josef@redhat.com>
> >   
> 
> Hi, Josef,
> 
> I've tested your patch(May 22) with my tools, and one case triggered a bug
> which made writev operation hang up, more information is followed.
> 
> - Steps to trigger it:
>   # mount /dev/sda8 /home/btrfsdisk -o nodatacow
>   # gcc direct-io.c -o direct-io
>   # ./direct-io O_DIRECT writev /home/btrfsdisk/testrw 4M

Thanks for sending along this test program and bug report.  We've fixed
a few bugs in the O_DIRECT patches, and this is working now.

The merged result is in the for-linus branch of the btrfs unstable tree.

-chris

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] Btrfs: do aio_write instead of write
  2010-05-27 12:59     ` Chris Mason
@ 2010-05-28  1:42       ` liubo
  0 siblings, 0 replies; 19+ messages in thread
From: liubo @ 2010-05-28  1:42 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, linux-btrfs, linux-fsdevel, linux-kernel

On 05/27/2010 08:59 PM, Chris Mason wrote:
> Thanks for sending along this test program and bug report.  We've fixed
> a few bugs in the O_DIRECT patches, and this is working now.
>
> The merged result is in the for-linus branch of the btrfs unstable tree.
>
> -chris
>   

Oh, Thanks a lot. I've see your great effort.

-liubo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] Btrfs: do aio_write instead of write
  2010-05-21 17:03 ` [PATCH 6/6] Btrfs: do aio_write instead of write Josef Bacik
  2010-05-23  8:31   ` Shi Weihua
  2010-05-27  3:06   ` liubo
@ 2010-05-30  9:33   ` Dmitri Nikulin
  2010-06-01 13:19     ` Chris Mason
  2 siblings, 1 reply; 19+ messages in thread
From: Dmitri Nikulin @ 2010-05-30  9:33 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs

On Sat, May 22, 2010 at 3:03 AM, Josef Bacik <josef@redhat.com> wrote:
> + =C2=A0 =C2=A0 =C2=A0 while (1) {
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 lock_extent(tree, =
start, end, GFP_NOFS);
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ordered =3D btrfs_=
lookup_ordered_extent(inode, start);
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (!ordered)
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 break;
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 unlock_extent(tree=
, start, end, GFP_NOFS);

Is it ok not to unlock_extent if !ordered?
I don't know if you fixed this in a later version but it stuck out to m=
e :)

--=20
Dmitri Nikulin

Centre for Synchrotron Science
Monash University
Victoria 3800, Australia
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] Btrfs: do aio_write instead of write
  2010-05-30  9:33   ` Dmitri Nikulin
@ 2010-06-01 13:19     ` Chris Mason
  2010-06-01 23:20       ` Dmitri Nikulin
  0 siblings, 1 reply; 19+ messages in thread
From: Chris Mason @ 2010-06-01 13:19 UTC (permalink / raw)
  To: Dmitri Nikulin; +Cc: Josef Bacik, linux-btrfs

On Sun, May 30, 2010 at 07:33:33PM +1000, Dmitri Nikulin wrote:
> On Sat, May 22, 2010 at 3:03 AM, Josef Bacik <josef@redhat.com> wrote=
:
> > + =A0 =A0 =A0 while (1) {
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 lock_extent(tree, start, end, GFP_NOF=
S);
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 ordered =3D btrfs_lookup_ordered_exte=
nt(inode, start);
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (!ordered)
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 unlock_extent(tree, start, end, GFP_N=
OFS);
>=20
> Is it ok not to unlock_extent if !ordered?
> I don't know if you fixed this in a later version but it stuck out to=
 me :)

The construct is confusing.  Ordered extents track things that we have
allocated on disk and need to write.  New ones can't be created while w=
e
have the extent range locked.  But we can't force old ones to disk with
the lock held.

So, we lock then lookup and if we find nothing we can safely do our
operation because no io is in progress.  We unlock a little later on, o=
r
at endio time.

If we find an ordered extent we drop the lock and wait for that IO to
finish, then loop again.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] Btrfs: do aio_write instead of write
  2010-06-01 13:19     ` Chris Mason
@ 2010-06-01 23:20       ` Dmitri Nikulin
  0 siblings, 0 replies; 19+ messages in thread
From: Dmitri Nikulin @ 2010-06-01 23:20 UTC (permalink / raw)
  To: linux-btrfs

On Tue, Jun 1, 2010 at 11:19 PM, Chris Mason <chris.mason@oracle.com> w=
rote:
>> Is it ok not to unlock_extent if !ordered?
>> I don't know if you fixed this in a later version but it stuck out t=
o me :)
>
> The construct is confusing. =C2=A0Ordered extents track things that w=
e have
> allocated on disk and need to write. =C2=A0New ones can't be created =
while we
> have the extent range locked. =C2=A0But we can't force old ones to di=
sk with
> the lock held.
>
> So, we lock then lookup and if we find nothing we can safely do our
> operation because no io is in progress. =C2=A0We unlock a little late=
r on, or
> at endio time.
>
> If we find an ordered extent we drop the lock and wait for that IO to
> finish, then loop again.

Ok, that's fair enough. Maybe it's worth commenting, I'm sure I'm not
the only one surprised.

Thanks,

--=20
Dmitri Nikulin

Centre for Synchrotron Science
Monash University
Victoria 3800, Australia
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] fs: kill blockdev_direct_IO_no_locking
  2010-05-19 20:24 ` [PATCH 4/6] fs: kill blockdev_direct_IO_no_locking Josef Bacik
@ 2010-05-21 15:21   ` Christoph Hellwig
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2010-05-21 15:21 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, linux-fsdevel, linux-kernel, hch, akpm

On Wed, May 19, 2010 at 04:24:52PM -0400, Josef Bacik wrote:
> Christoph said he'd rather everybody use __blockdev_direct_IO directly instead
> of having a bunch of random helper functions, so thats what this patch does.
> It's a basic change, I've tested it with xfstests on ext4 and xfs.  Thanks,
> 
> Signed-off-by: Josef Bacik <josef@redhat.com>

Ok,


Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 4/6] fs: kill blockdev_direct_IO_no_locking
  2010-05-19 20:24 [PATCH 1/6] fs: allow short direct-io reads to be completed via buffered IO V2 Josef Bacik
@ 2010-05-19 20:24 ` Josef Bacik
  2010-05-21 15:21   ` Christoph Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Josef Bacik @ 2010-05-19 20:24 UTC (permalink / raw)
  To: linux-btrfs, linux-fsdevel, linux-kernel, hch, akpm

Christoph said he'd rather everybody use __blockdev_direct_IO directly instead
of having a bunch of random helper functions, so thats what this patch does.
It's a basic change, I've tested it with xfstests on ext4 and xfs.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
---
 fs/block_dev.c              |    5 +++--
 fs/ext4/inode.c             |    8 ++++----
 fs/gfs2/aops.c              |    6 +++---
 fs/ocfs2/aops.c             |    8 +++-----
 fs/xfs/linux-2.6/xfs_aops.c |    7 +++----
 include/linux/fs.h          |    9 ---------
 6 files changed, 16 insertions(+), 27 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 6dcee88..0f42cbc 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -172,8 +172,9 @@ blkdev_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file->f_mapping->host;
 
-	return blockdev_direct_IO_no_locking(rw, iocb, inode, I_BDEV(inode),
-				iov, offset, nr_segs, blkdev_get_blocks, NULL);
+	return __blockdev_direct_IO(rw, iocb, inode, I_BDEV(inode), iov,
+				    offset, nr_segs, blkdev_get_blocks, NULL,
+				    NULL, 0);
 }
 
 int __sync_blockdev(struct block_device *bdev, int wait)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 81d6054..8f37762 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3494,10 +3494,10 @@ static ssize_t ext4_ind_direct_IO(int rw, struct kiocb *iocb,
 
 retry:
 	if (rw == READ && ext4_should_dioread_nolock(inode))
-		ret = blockdev_direct_IO_no_locking(rw, iocb, inode,
-				 inode->i_sb->s_bdev, iov,
-				 offset, nr_segs,
-				 ext4_get_block, NULL);
+		ret = __blockdev_direct_IO(rw, iocb, inode,
+					   inode->i_sb->s_bdev, iov, offset,
+					   nr_segs, ext4_get_block, NULL, NULL,
+					   0);
 	else
 		ret = blockdev_direct_IO(rw, iocb, inode,
 				 inode->i_sb->s_bdev, iov,
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 0c1d0b8..45b23b0 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -1039,9 +1039,9 @@ static ssize_t gfs2_direct_IO(int rw, struct kiocb *iocb,
 	if (rv != 1)
 		goto out; /* dio not valid, fall back to buffered i/o */
 
-	rv = blockdev_direct_IO_no_locking(rw, iocb, inode, inode->i_sb->s_bdev,
-					   iov, offset, nr_segs,
-					   gfs2_get_block_direct, NULL);
+	rv = __blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev,
+				  iov, offset, nr_segs, gfs2_get_block_direct,
+				  NULL, NULL, 0);
 out:
 	gfs2_glock_dq_m(1, &gh);
 	gfs2_holder_uninit(&gh);
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 21441dd..f2e53a9 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -669,11 +669,9 @@ static ssize_t ocfs2_direct_IO(int rw,
 	if (i_size_read(inode) <= offset)
 		return 0;
 
-	ret = blockdev_direct_IO_no_locking(rw, iocb, inode,
-					    inode->i_sb->s_bdev, iov, offset,
-					    nr_segs,
-					    ocfs2_direct_IO_get_blocks,
-					    ocfs2_dio_end_io);
+	ret = __blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov,
+				   offset, nr_segs, ocfs2_direct_IO_get_blocks,
+				   ocfs2_dio_end_io, NULL, 0);
 
 	mlog_exit(ret);
 	return ret;
diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c
index 0f8b996..fcc14d8 100644
--- a/fs/xfs/linux-2.6/xfs_aops.c
+++ b/fs/xfs/linux-2.6/xfs_aops.c
@@ -1617,10 +1617,9 @@ xfs_vm_direct_IO(
 	iocb->private = xfs_alloc_ioend(inode, rw == WRITE ?
 					IOMAP_UNWRITTEN : IOMAP_READ);
 
-	ret = blockdev_direct_IO_no_locking(rw, iocb, inode, bdev, iov,
-					    offset, nr_segs,
-					    xfs_get_blocks_direct,
-					    xfs_end_io_direct);
+	ret = __blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset, nr_segs,
+				   xfs_get_blocks_direct, xfs_end_io_direct,
+				   NULL, 0);
 
 	if (unlikely(ret != -EIOCBQUEUED && iocb->private))
 		xfs_destroy_ioend(iocb->private);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 10704f0..d841e7d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2277,15 +2277,6 @@ static inline ssize_t blockdev_direct_IO(int rw, struct kiocb *iocb,
 				    nr_segs, get_block, end_io, NULL,
 				    DIO_LOCKING | DIO_SKIP_HOLES);
 }
-
-static inline ssize_t blockdev_direct_IO_no_locking(int rw, struct kiocb *iocb,
-	struct inode *inode, struct block_device *bdev, const struct iovec *iov,
-	loff_t offset, unsigned long nr_segs, get_block_t get_block,
-	dio_iodone_t end_io)
-{
-	return __blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset,
-				    nr_segs, get_block, end_io, NULL, 0);
-}
 #endif
 
 extern const struct file_operations generic_ro_fops;
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2010-06-01 23:20 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-21 17:03 [PATCH 1/6] fs: allow short direct-io reads to be completed via buffered IO Josef Bacik
2010-05-21 17:03 ` [PATCH 2/6] direct-io: add a hook for the fs to provide its own submit_bio function Josef Bacik
2010-05-21 17:03 ` [PATCH 3/6] direct-io: do not merge logically non-contiguous requests Josef Bacik
2010-05-22  1:47   ` Mike Fedyk
2010-05-22 14:02     ` Josef Bacik
2010-05-21 17:03 ` [PATCH 4/6] fs: kill blockdev_direct_IO_no_locking Josef Bacik
2010-05-21 17:03 ` [PATCH 5/6] Btrfs: add basic DIO read/write support Josef Bacik
2010-05-21 17:03 ` [PATCH 6/6] Btrfs: do aio_write instead of write Josef Bacik
2010-05-23  8:31   ` Shi Weihua
2010-05-23 23:59     ` Josef Bacik
2010-05-27  3:06   ` liubo
2010-05-27  3:13     ` liubo
2010-05-27 12:59     ` Chris Mason
2010-05-28  1:42       ` liubo
2010-05-30  9:33   ` Dmitri Nikulin
2010-06-01 13:19     ` Chris Mason
2010-06-01 23:20       ` Dmitri Nikulin
  -- strict thread matches above, loose matches on Subject: below --
2010-05-19 20:24 [PATCH 1/6] fs: allow short direct-io reads to be completed via buffered IO V2 Josef Bacik
2010-05-19 20:24 ` [PATCH 4/6] fs: kill blockdev_direct_IO_no_locking Josef Bacik
2010-05-21 15:21   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).