All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] direct-io: only inc/dec inode->i_dio_count for file systems
@ 2015-04-15 22:01 Jens Axboe
  2015-04-15 22:01 ` [PATCH 1/3] " Jens Axboe
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Jens Axboe @ 2015-04-15 22:01 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel

Hi,

A proper patch series posting of the patch to skip inode dio counts
where we don't need them. Performance results and general notes in
patch #1. We can take this a bit further if we want to push this
truncate locking into the caller, but I'd like to keep that as a
separate patch series instead.

Changes since v2:

- Changed flag to DIO_SKIP_DIO_COUNT. Patch 2+3 will use this for
  skipping the extra inc/dec for btrfs and ext4, so better rename it
  to something that better tells what is happening in the deep mess
  that is dio.

- Have inode_dio_inc/dec() helpers. Again, this better tells the caller
  what is going on.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] direct-io: only inc/dec inode->i_dio_count for file systems
  2015-04-15 22:01 [PATCH v2] direct-io: only inc/dec inode->i_dio_count for file systems Jens Axboe
@ 2015-04-15 22:01 ` Jens Axboe
  2015-04-15 22:36   ` Dave Chinner
  2015-04-15 22:01 ` [PATCH 2/3] btrfs: pass in DIO_SKIP_DIO_COUNT to do_blockdev_direct_IO() Jens Axboe
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2015-04-15 22:01 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel
  Cc: Jens Axboe, Andrew Morton, Christoph Hellwig, Theodore Ts'o,
	Elliott, Robert (Server Storage),
	Al Viro

do_blockdev_direct_IO() increments and decrements the inode
->i_dio_count for each IO operation. It does this to protect against
truncate of a file. Block devices don't need this sort of protection.

For a capable multiqueue setup, this atomic int is the only shared
state between applications accessing the device for O_DIRECT, and it
presents a scaling wall for that. In my testing, as much as 30% of
system time is spent incrementing and decrementing this value. A mixed
read/write workload improved from ~2.5M IOPS to ~9.6M IOPS, with
better latencies too. Before:

clat percentiles (usec):
 |  1.00th=[   33],  5.00th=[   34], 10.00th=[   34], 20.00th=[   34],
 | 30.00th=[   34], 40.00th=[   34], 50.00th=[   35], 60.00th=[   35],
 | 70.00th=[   35], 80.00th=[   35], 90.00th=[   37], 95.00th=[   80],
 | 99.00th=[   98], 99.50th=[  151], 99.90th=[  155], 99.95th=[  155],
 | 99.99th=[  165]

After:

clat percentiles (usec):
 |  1.00th=[   95],  5.00th=[  108], 10.00th=[  129], 20.00th=[  149],
 | 30.00th=[  155], 40.00th=[  161], 50.00th=[  167], 60.00th=[  171],
 | 70.00th=[  177], 80.00th=[  185], 90.00th=[  201], 95.00th=[  270],
 | 99.00th=[  390], 99.50th=[  398], 99.90th=[  418], 99.95th=[  422],
 | 99.99th=[  438]

In other setups, Robert Elliott reported seeing good performance
improvements:

https://lkml.org/lkml/2015/4/3/557

The more applications accessing the device, the worse it gets.

Add a new direct-io flags, DIO_SKIP_DIO_COUNT, which tells
do_blockdev_direct_IO() that it need not worry about incrementing
or decrementing the inode i_dio_count for this caller.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Elliott, Robert (Server Storage) <elliott@hp.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jens Axboe <axboe@fb.com>
---
 fs/block_dev.c     |  2 +-
 fs/btrfs/inode.c   |  6 +++---
 fs/dax.c           |  4 ++--
 fs/direct-io.c     |  7 +++++--
 fs/ext4/indirect.c |  6 +++---
 fs/ext4/inode.c    |  4 ++--
 fs/inode.c         | 19 ++++++++++++++++---
 fs/nfs/direct.c    | 10 +++++-----
 include/linux/fs.h |  6 +++++-
 9 files changed, 42 insertions(+), 22 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 975266be67d3..be1335dbd6be 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -155,7 +155,7 @@ blkdev_direct_IO(int rw, struct kiocb *iocb, struct iov_iter *iter,
 
 	return __blockdev_direct_IO(rw, iocb, inode, I_BDEV(inode), iter,
 				    offset, blkdev_get_block,
-				    NULL, NULL, 0);
+				    NULL, NULL, DIO_SKIP_DIO_COUNT);
 }
 
 int __sync_blockdev(struct block_device *bdev, int wait)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d2e732d7af52..6fe341a66ed8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8129,7 +8129,7 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb,
 	if (check_direct_IO(BTRFS_I(inode)->root, rw, iocb, iter, offset))
 		return 0;
 
-	atomic_inc(&inode->i_dio_count);
+	inode_dio_inc(inode);
 	smp_mb__after_atomic();
 
 	/*
@@ -8169,7 +8169,7 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb,
 		current->journal_info = &outstanding_extents;
 	} else if (test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
 				     &BTRFS_I(inode)->runtime_flags)) {
-		inode_dio_done(inode);
+		inode_dio_dec(inode);
 		flags = DIO_LOCKING | DIO_SKIP_HOLES;
 		wakeup = false;
 	}
@@ -8188,7 +8188,7 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb,
 	}
 out:
 	if (wakeup)
-		inode_dio_done(inode);
+		inode_dio_dec(inode);
 	if (relock)
 		mutex_lock(&inode->i_mutex);
 
diff --git a/fs/dax.c b/fs/dax.c
index ed1619ec6537..1d4a9a54a938 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -210,7 +210,7 @@ ssize_t dax_do_io(int rw, struct kiocb *iocb, struct inode *inode,
 	}
 
 	/* Protects against truncate */
-	atomic_inc(&inode->i_dio_count);
+	inode_dio_inc(inode);
 
 	retval = dax_io(rw, inode, iter, pos, end, get_block, &bh);
 
@@ -220,7 +220,7 @@ ssize_t dax_do_io(int rw, struct kiocb *iocb, struct inode *inode,
 	if ((retval > 0) && end_io)
 		end_io(iocb, pos, retval, bh.b_private);
 
-	inode_dio_done(inode);
+	inode_dio_dec(inode);
  out:
 	return retval;
 }
diff --git a/fs/direct-io.c b/fs/direct-io.c
index e181b6b2e297..cba348e3135f 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -254,7 +254,9 @@ static ssize_t dio_complete(struct dio *dio, loff_t offset, ssize_t ret,
 	if (dio->end_io && dio->result)
 		dio->end_io(dio->iocb, offset, transferred, dio->private);
 
-	inode_dio_done(dio->inode);
+	if (!(dio->flags & DIO_SKIP_DIO_COUNT))
+		inode_dio_dec(dio->inode);
+
 	if (is_async) {
 		if (dio->rw & WRITE) {
 			int err;
@@ -1199,7 +1201,8 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
 	/*
 	 * Will be decremented at I/O completion time.
 	 */
-	atomic_inc(&inode->i_dio_count);
+	if (!(dio->flags & DIO_SKIP_DIO_COUNT))
+		inode_dio_inc(inode);
 
 	retval = 0;
 	sdio.blkbits = blkbits;
diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index 45fe924f82bc..bf77e5b73ae2 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -682,11 +682,11 @@ retry:
 		 * via ext4_inode_block_unlocked_dio(). Check inode's state
 		 * while holding extra i_dio_count ref.
 		 */
-		atomic_inc(&inode->i_dio_count);
+		inode_dio_inc(inode);
 		smp_mb();
 		if (unlikely(ext4_test_inode_state(inode,
 						    EXT4_STATE_DIOREAD_LOCK))) {
-			inode_dio_done(inode);
+			inode_dio_dec(inode);
 			goto locked;
 		}
 		if (IS_DAX(inode))
@@ -696,7 +696,7 @@ retry:
 			ret = __blockdev_direct_IO(rw, iocb, inode,
 					inode->i_sb->s_bdev, iter, offset,
 					ext4_get_block, NULL, NULL, 0);
-		inode_dio_done(inode);
+		inode_dio_dec(inode);
 	} else {
 locked:
 		if (IS_DAX(inode))
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5cb9a212b86f..62d4615aa9b0 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2978,7 +2978,7 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb,
 	 * overwrite DIO as i_dio_count needs to be incremented under i_mutex.
 	 */
 	if (rw == WRITE)
-		atomic_inc(&inode->i_dio_count);
+		inode_dio_inc(inode);
 
 	/* If we do a overwrite dio, i_mutex locking can be released */
 	overwrite = *((int *)iocb->private);
@@ -3080,7 +3080,7 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb,
 
 retake_lock:
 	if (rw == WRITE)
-		inode_dio_done(inode);
+		inode_dio_dec(inode);
 	/* take i_mutex locking again if we do a ovewrite dio */
 	if (overwrite) {
 		up_read(&EXT4_I(inode)->i_data_sem);
diff --git a/fs/inode.c b/fs/inode.c
index f00b16f45507..c4901c40ad65 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1946,18 +1946,31 @@ void inode_dio_wait(struct inode *inode)
 EXPORT_SYMBOL(inode_dio_wait);
 
 /*
- * inode_dio_done - signal finish of a direct I/O requests
+ * inode_dio_begin - signal start of a direct I/O requests
  * @inode: inode the direct I/O happens on
  *
  * This is called once we've finished processing a direct I/O request,
  * and is used to wake up callers waiting for direct I/O to be quiesced.
  */
-void inode_dio_done(struct inode *inode)
+void inode_dio_inc(struct inode *inode)
+{
+	atomic_inc(&inode->i_dio_count);
+}
+EXPORT_SYMBOL(inode_dio_inc);
+
+/*
+ * inode_dio_dec - signal finish of a direct I/O requests
+ * @inode: inode the direct I/O happens on
+ *
+ * This is called once we've finished processing a direct I/O request,
+ * and is used to wake up callers waiting for direct I/O to be quiesced.
+ */
+void inode_dio_dec(struct inode *inode)
 {
 	if (atomic_dec_and_test(&inode->i_dio_count))
 		wake_up_bit(&inode->i_state, __I_DIO_WAKEUP);
 }
-EXPORT_SYMBOL(inode_dio_done);
+EXPORT_SYMBOL(inode_dio_dec);
 
 /*
  * inode_set_flags - atomically set some inode flags
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index e907c8cf732e..2890317173eb 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -387,7 +387,7 @@ static void nfs_direct_complete(struct nfs_direct_req *dreq, bool write)
 	if (write)
 		nfs_zap_mapping(inode, inode->i_mapping);
 
-	inode_dio_done(inode);
+	inode_dio_dec(inode);
 
 	if (dreq->iocb) {
 		long res = (long) dreq->error;
@@ -487,7 +487,7 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
 			     &nfs_direct_read_completion_ops);
 	get_dreq(dreq);
 	desc.pg_dreq = dreq;
-	atomic_inc(&inode->i_dio_count);
+	inode_dio_inc(inode);
 
 	while (iov_iter_count(iter)) {
 		struct page **pagevec;
@@ -539,7 +539,7 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
 	 * generic layer handle the completion.
 	 */
 	if (requested_bytes == 0) {
-		inode_dio_done(inode);
+		inode_dio_dec(inode);
 		nfs_direct_req_release(dreq);
 		return result < 0 ? result : -EIO;
 	}
@@ -873,7 +873,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 			      &nfs_direct_write_completion_ops);
 	desc.pg_dreq = dreq;
 	get_dreq(dreq);
-	atomic_inc(&inode->i_dio_count);
+	inode_dio_inc(inode);
 
 	NFS_I(inode)->write_io += iov_iter_count(iter);
 	while (iov_iter_count(iter)) {
@@ -929,7 +929,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 	 * generic layer handle the completion.
 	 */
 	if (requested_bytes == 0) {
-		inode_dio_done(inode);
+		inode_dio_dec(inode);
 		nfs_direct_req_release(dreq);
 		return result < 0 ? result : -EIO;
 	}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 52cc4492cb3a..1292ce1d9be5 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2613,6 +2613,9 @@ enum {
 
 	/* filesystem can handle aio writes beyond i_size */
 	DIO_ASYNC_EXTEND = 0x04,
+
+	/* inode/fs/bdev does not need truncate protection */
+	DIO_SKIP_DIO_COUNT = 0x08,
 };
 
 void dio_end_io(struct bio *bio, int error);
@@ -2633,7 +2636,8 @@ static inline ssize_t blockdev_direct_IO(int rw, struct kiocb *iocb,
 #endif
 
 void inode_dio_wait(struct inode *inode);
-void inode_dio_done(struct inode *inode);
+void inode_dio_inc(struct inode *inode);
+void inode_dio_dec(struct inode *inode);
 
 extern void inode_set_flags(struct inode *inode, unsigned int flags,
 			    unsigned int mask);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] btrfs: pass in DIO_SKIP_DIO_COUNT to do_blockdev_direct_IO()
  2015-04-15 22:01 [PATCH v2] direct-io: only inc/dec inode->i_dio_count for file systems Jens Axboe
  2015-04-15 22:01 ` [PATCH 1/3] " Jens Axboe
@ 2015-04-15 22:01 ` Jens Axboe
  2015-04-15 22:01 ` [PATCH 3/3] ext4: " Jens Axboe
  2015-04-15 22:05 ` [PATCH v2] direct-io: only inc/dec inode->i_dio_count for file systems Al Viro
  3 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2015-04-15 22:01 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: Jens Axboe, Chris Mason

btrfs_direct_IO() already holds the inode i_dio_count elevated, so
pass in that do_blockdev_direct_IO() does not need to. If we end
up dropping the inode dio count before calling this function, then
we drop the flag as well.

Cc: Chris Mason <clm@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
---
 fs/btrfs/inode.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 6fe341a66ed8..3ecac37f10e0 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8121,8 +8121,7 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb,
 	struct inode *inode = file->f_mapping->host;
 	u64 outstanding_extents = 0;
 	size_t count = 0;
-	int flags = 0;
-	bool wakeup = true;
+	int flags = DIO_SKIP_DIO_COUNT;
 	bool relock = false;
 	ssize_t ret;
 
@@ -8170,8 +8169,11 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb,
 	} else if (test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
 				     &BTRFS_I(inode)->runtime_flags)) {
 		inode_dio_dec(inode);
+		/*
+		 * we know need i_dio_count inc/dec, the below overwrites
+		 * the skip inc/dec flag.
+		 */
 		flags = DIO_LOCKING | DIO_SKIP_HOLES;
-		wakeup = false;
 	}
 
 	ret = __blockdev_direct_IO(rw, iocb, inode,
@@ -8187,7 +8189,7 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb,
 						     count - (size_t)ret);
 	}
 out:
-	if (wakeup)
+	if (flags & DIO_SKIP_DIO_COUNT)
 		inode_dio_dec(inode);
 	if (relock)
 		mutex_lock(&inode->i_mutex);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] ext4: pass in DIO_SKIP_DIO_COUNT to do_blockdev_direct_IO()
  2015-04-15 22:01 [PATCH v2] direct-io: only inc/dec inode->i_dio_count for file systems Jens Axboe
  2015-04-15 22:01 ` [PATCH 1/3] " Jens Axboe
  2015-04-15 22:01 ` [PATCH 2/3] btrfs: pass in DIO_SKIP_DIO_COUNT to do_blockdev_direct_IO() Jens Axboe
@ 2015-04-15 22:01 ` Jens Axboe
  2015-04-15 22:05 ` [PATCH v2] direct-io: only inc/dec inode->i_dio_count for file systems Al Viro
  3 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2015-04-15 22:01 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel; +Cc: Jens Axboe, Theodore Ts'o

For writes, ext4_ext_direct_IO() already holds the inode i_dio_count
elevated, so we don't need the extra inc/dec that is done in
do_blockdev_direct_IO(). Pass in DIO_SKIP_DIO_COUNT to tell it that.

Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jens Axboe <axboe@fb.com>
---
 fs/ext4/inode.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 62d4615aa9b0..2605355791a3 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2977,8 +2977,10 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb,
 	 * conversion. This also disallows race between truncate() and
 	 * overwrite DIO as i_dio_count needs to be incremented under i_mutex.
 	 */
-	if (rw == WRITE)
+	if (rw == WRITE) {
 		inode_dio_inc(inode);
+		dio_flags |= DIO_SKIP_DIO_COUNT;
+	}
 
 	/* If we do a overwrite dio, i_mutex locking can be released */
 	overwrite = *((int *)iocb->private);
@@ -3032,7 +3034,7 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb,
 		get_block_func = ext4_get_block_write_nolock;
 	} else {
 		get_block_func = ext4_get_block_write;
-		dio_flags = DIO_LOCKING;
+		dio_flags |= DIO_LOCKING;
 	}
 	if (IS_DAX(inode))
 		ret = dax_do_io(rw, iocb, inode, iter, offset, get_block_func,
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] direct-io: only inc/dec inode->i_dio_count for file systems
  2015-04-15 22:01 [PATCH v2] direct-io: only inc/dec inode->i_dio_count for file systems Jens Axboe
                   ` (2 preceding siblings ...)
  2015-04-15 22:01 ` [PATCH 3/3] ext4: " Jens Axboe
@ 2015-04-15 22:05 ` Al Viro
  2015-04-15 22:06   ` Jens Axboe
  3 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2015-04-15 22:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, linux-fsdevel

On Wed, Apr 15, 2015 at 04:01:35PM -0600, Jens Axboe wrote:
> Hi,
> 
> A proper patch series posting of the patch to skip inode dio counts
> where we don't need them. Performance results and general notes in
> patch #1. We can take this a bit further if we want to push this
> truncate locking into the caller, but I'd like to keep that as a
> separate patch series instead.
> 
> Changes since v2:
> 
> - Changed flag to DIO_SKIP_DIO_COUNT. Patch 2+3 will use this for
>   skipping the extra inc/dec for btrfs and ext4, so better rename it
>   to something that better tells what is happening in the deep mess
>   that is dio.
> 
> - Have inode_dio_inc/dec() helpers. Again, this better tells the caller
>   what is going on.

Please, rebase on vfs.git#for-next; it conflicts (at least) with Omar's
->direct_IO() patches, if not with generic_write_checks() ones following
those.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] direct-io: only inc/dec inode->i_dio_count for file systems
  2015-04-15 22:05 ` [PATCH v2] direct-io: only inc/dec inode->i_dio_count for file systems Al Viro
@ 2015-04-15 22:06   ` Jens Axboe
  0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2015-04-15 22:06 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel, linux-fsdevel

On 04/15/2015 04:05 PM, Al Viro wrote:
> On Wed, Apr 15, 2015 at 04:01:35PM -0600, Jens Axboe wrote:
>> Hi,
>>
>> A proper patch series posting of the patch to skip inode dio counts
>> where we don't need them. Performance results and general notes in
>> patch #1. We can take this a bit further if we want to push this
>> truncate locking into the caller, but I'd like to keep that as a
>> separate patch series instead.
>>
>> Changes since v2:
>>
>> - Changed flag to DIO_SKIP_DIO_COUNT. Patch 2+3 will use this for
>>    skipping the extra inc/dec for btrfs and ext4, so better rename it
>>    to something that better tells what is happening in the deep mess
>>    that is dio.
>>
>> - Have inode_dio_inc/dec() helpers. Again, this better tells the caller
>>    what is going on.
>
> Please, rebase on vfs.git#for-next; it conflicts (at least) with Omar's
> ->direct_IO() patches, if not with generic_write_checks() ones following
> those.

Sure, I can do that.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] direct-io: only inc/dec inode->i_dio_count for file systems
  2015-04-15 22:01 ` [PATCH 1/3] " Jens Axboe
@ 2015-04-15 22:36   ` Dave Chinner
  2015-04-15 22:56     ` Al Viro
  2015-04-15 22:57     ` Jens Axboe
  0 siblings, 2 replies; 12+ messages in thread
From: Dave Chinner @ 2015-04-15 22:36 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-kernel, linux-fsdevel, Andrew Morton, Christoph Hellwig,
	Theodore Ts'o, Elliott, Robert (Server Storage),
	Al Viro

On Wed, Apr 15, 2015 at 04:01:36PM -0600, Jens Axboe wrote:
> do_blockdev_direct_IO() increments and decrements the inode
> ->i_dio_count for each IO operation. It does this to protect against
> truncate of a file. Block devices don't need this sort of protection.
> 
> For a capable multiqueue setup, this atomic int is the only shared
> state between applications accessing the device for O_DIRECT, and it
> presents a scaling wall for that. In my testing, as much as 30% of
> system time is spent incrementing and decrementing this value. A mixed
> read/write workload improved from ~2.5M IOPS to ~9.6M IOPS, with
> better latencies too. Before:
.....
> diff --git a/fs/inode.c b/fs/inode.c
> index f00b16f45507..c4901c40ad65 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1946,18 +1946,31 @@ void inode_dio_wait(struct inode *inode)
>  EXPORT_SYMBOL(inode_dio_wait);
>  
>  /*
> - * inode_dio_done - signal finish of a direct I/O requests
> + * inode_dio_begin - signal start of a direct I/O requests
>   * @inode: inode the direct I/O happens on
>   *
>   * This is called once we've finished processing a direct I/O request,
>   * and is used to wake up callers waiting for direct I/O to be quiesced.
>   */
> -void inode_dio_done(struct inode *inode)
> +void inode_dio_inc(struct inode *inode)

function name does not match docbook comment....

> +{
> +	atomic_inc(&inode->i_dio_count);
> +}
> +EXPORT_SYMBOL(inode_dio_inc);
> +
> +/*
> + * inode_dio_dec - signal finish of a direct I/O requests
> + * @inode: inode the direct I/O happens on
> + *
> + * This is called once we've finished processing a direct I/O request,
> + * and is used to wake up callers waiting for direct I/O to be quiesced.
> + */
> +void inode_dio_dec(struct inode *inode)
>  {
>  	if (atomic_dec_and_test(&inode->i_dio_count))
>  		wake_up_bit(&inode->i_state, __I_DIO_WAKEUP);
>  }
> -EXPORT_SYMBOL(inode_dio_done);
> +EXPORT_SYMBOL(inode_dio_dec);

Bikeshedding: I think this would be better suited to inode_dio_begin()
and inode_dio_end() because now we are trying to say "this is where
the DIO starts, and this is where it ends". It's not really
"reference counting" interface, we're trying to annotate the
boundaries of where DIO iis protected against truncate....

And, realistically, if we are pushing this up into the filesystems
again, we should push it up into *all* filesystems and get rid of it
completely from the DIO layer. That way no new twisty passages in
the direct IO code are needed.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] direct-io: only inc/dec inode->i_dio_count for file systems
  2015-04-15 22:36   ` Dave Chinner
@ 2015-04-15 22:56     ` Al Viro
  2015-04-15 23:05       ` Jens Axboe
  2015-04-15 22:57     ` Jens Axboe
  1 sibling, 1 reply; 12+ messages in thread
From: Al Viro @ 2015-04-15 22:56 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Jens Axboe, linux-kernel, linux-fsdevel, Andrew Morton,
	Christoph Hellwig, Theodore Ts'o, Elliott,
	Robert (Server Storage)

On Thu, Apr 16, 2015 at 08:36:20AM +1000, Dave Chinner wrote:

> Bikeshedding: I think this would be better suited to inode_dio_begin()
> and inode_dio_end() because now we are trying to say "this is where
> the DIO starts, and this is where it ends". It's not really
> "reference counting" interface, we're trying to annotate the
> boundaries of where DIO iis protected against truncate....

*nod*

And while we are at, inode_dio_begin() could be static inline just fine.

> And, realistically, if we are pushing this up into the filesystems
> again, we should push it up into *all* filesystems and get rid of it
> completely from the DIO layer. That way no new twisty passages in
> the direct IO code are needed.

Maybe...  FWIW, since the code *checking* ->i_dio_count is already
inside the individual filesystems (and right now I'm trying to verify
it), we might as well have the code changing it there as well.  AFAICS,
the main problem with that is the need to do it after the end of async
operation.  Sure, we do have callbacks, but we'd need something like
	* start protection
	* submit DIO
	* if that has failed synchronously, drop protection and fuck off
	* otherwise let the callback deal with dropping it.
What's more, we'd get not just a slew of boilerplate callbacks, but make
life interesting for branch predictor - indirect calls are not fun...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] direct-io: only inc/dec inode->i_dio_count for file systems
  2015-04-15 22:36   ` Dave Chinner
  2015-04-15 22:56     ` Al Viro
@ 2015-04-15 22:57     ` Jens Axboe
  1 sibling, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2015-04-15 22:57 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-kernel, linux-fsdevel, Andrew Morton, Christoph Hellwig,
	Theodore Ts'o, Elliott, Robert (Server Storage),
	Al Viro

On 04/15/2015 04:36 PM, Dave Chinner wrote:
> On Wed, Apr 15, 2015 at 04:01:36PM -0600, Jens Axboe wrote:
>> do_blockdev_direct_IO() increments and decrements the inode
>> ->i_dio_count for each IO operation. It does this to protect against
>> truncate of a file. Block devices don't need this sort of protection.
>>
>> For a capable multiqueue setup, this atomic int is the only shared
>> state between applications accessing the device for O_DIRECT, and it
>> presents a scaling wall for that. In my testing, as much as 30% of
>> system time is spent incrementing and decrementing this value. A mixed
>> read/write workload improved from ~2.5M IOPS to ~9.6M IOPS, with
>> better latencies too. Before:
> .....
>> diff --git a/fs/inode.c b/fs/inode.c
>> index f00b16f45507..c4901c40ad65 100644
>> --- a/fs/inode.c
>> +++ b/fs/inode.c
>> @@ -1946,18 +1946,31 @@ void inode_dio_wait(struct inode *inode)
>>   EXPORT_SYMBOL(inode_dio_wait);
>>
>>   /*
>> - * inode_dio_done - signal finish of a direct I/O requests
>> + * inode_dio_begin - signal start of a direct I/O requests
>>    * @inode: inode the direct I/O happens on
>>    *
>>    * This is called once we've finished processing a direct I/O request,
>>    * and is used to wake up callers waiting for direct I/O to be quiesced.
>>    */
>> -void inode_dio_done(struct inode *inode)
>> +void inode_dio_inc(struct inode *inode)
>
> function name does not match docbook comment....

Oops, will fix that up.

>> +{
>> +	atomic_inc(&inode->i_dio_count);
>> +}
>> +EXPORT_SYMBOL(inode_dio_inc);
>> +
>> +/*
>> + * inode_dio_dec - signal finish of a direct I/O requests
>> + * @inode: inode the direct I/O happens on
>> + *
>> + * This is called once we've finished processing a direct I/O request,
>> + * and is used to wake up callers waiting for direct I/O to be quiesced.
>> + */
>> +void inode_dio_dec(struct inode *inode)
>>   {
>>   	if (atomic_dec_and_test(&inode->i_dio_count))
>>   		wake_up_bit(&inode->i_state, __I_DIO_WAKEUP);
>>   }
>> -EXPORT_SYMBOL(inode_dio_done);
>> +EXPORT_SYMBOL(inode_dio_dec);
>
> Bikeshedding: I think this would be better suited to inode_dio_begin()
> and inode_dio_end() because now we are trying to say "this is where
> the DIO starts, and this is where it ends". It's not really
> "reference counting" interface, we're trying to annotate the
> boundaries of where DIO iis protected against truncate....

I don't really care, if people like begin/end more than inc/dec, I'm 
happy with that.

> And, realistically, if we are pushing this up into the filesystems
> again, we should push it up into *all* filesystems and get rid of it
> completely from the DIO layer. That way no new twisty passages in
> the direct IO code are needed.

Lets please keep that for a potential round 2. It's not like I'm piling 
lots of hacks on, it's two one-liner changes. It's not adding a lot to 
the entropy of direct-io.c. I've been carrying this patch for years now, 
I really don't want to sign up for futzing around in direct-io.c, nor is 
that a reasonable requirement imho.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] direct-io: only inc/dec inode->i_dio_count for file systems
  2015-04-15 22:56     ` Al Viro
@ 2015-04-15 23:05       ` Jens Axboe
  2015-04-15 23:30         ` Al Viro
  0 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2015-04-15 23:05 UTC (permalink / raw)
  To: Al Viro, Dave Chinner
  Cc: linux-kernel, linux-fsdevel, Andrew Morton, Christoph Hellwig,
	Theodore Ts'o, Elliott, Robert (Server Storage)

[-- Attachment #1: Type: text/plain, Size: 651 bytes --]

On 04/15/2015 04:56 PM, Al Viro wrote:
> On Thu, Apr 16, 2015 at 08:36:20AM +1000, Dave Chinner wrote:
>
>> Bikeshedding: I think this would be better suited to inode_dio_begin()
>> and inode_dio_end() because now we are trying to say "this is where
>> the DIO starts, and this is where it ends". It's not really
>> "reference counting" interface, we're trying to annotate the
>> boundaries of where DIO iis protected against truncate....
>
> *nod*
>
> And while we are at, inode_dio_begin() could be static inline just fine.

Done (rename and docbook), and inode_dio_{begin.end}() made static inlines.

v3 against vfs-next attached.

-- 
Jens Axboe


[-- Attachment #2: 0001-direct-io-only-inc-dec-inode-i_dio_count-for-file-sy.patch --]
[-- Type: text/x-patch, Size: 10746 bytes --]

>From 9f70c975b8bd7dac51c2e512b7b1d6aa6f44c323 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe@fb.com>
Date: Wed, 15 Apr 2015 17:03:56 -0600
Subject: [PATCH] direct-io: only inc/dec inode->i_dio_count for file systems

do_blockdev_direct_IO() increments and decrements the inode
->i_dio_count for each IO operation. It does this to protect against
truncate of a file. Block devices don't need this sort of protection.

For a capable multiqueue setup, this atomic int is the only shared
state between applications accessing the device for O_DIRECT, and it
presents a scaling wall for that. In my testing, as much as 30% of
system time is spent incrementing and decrementing this value. A mixed
read/write workload improved from ~2.5M IOPS to ~9.6M IOPS, with
better latencies too. Before:

clat percentiles (usec):
 |  1.00th=[   33],  5.00th=[   34], 10.00th=[   34], 20.00th=[   34],
 | 30.00th=[   34], 40.00th=[   34], 50.00th=[   35], 60.00th=[   35],
 | 70.00th=[   35], 80.00th=[   35], 90.00th=[   37], 95.00th=[   80],
 | 99.00th=[   98], 99.50th=[  151], 99.90th=[  155], 99.95th=[  155],
 | 99.99th=[  165]

After:

clat percentiles (usec):
 |  1.00th=[   95],  5.00th=[  108], 10.00th=[  129], 20.00th=[  149],
 | 30.00th=[  155], 40.00th=[  161], 50.00th=[  167], 60.00th=[  171],
 | 70.00th=[  177], 80.00th=[  185], 90.00th=[  201], 95.00th=[  270],
 | 99.00th=[  390], 99.50th=[  398], 99.90th=[  418], 99.95th=[  422],
 | 99.99th=[  438]

In other setups, Robert Elliott reported seeing good performance
improvements:

https://lkml.org/lkml/2015/4/3/557

The more applications accessing the device, the worse it gets.

Add a new direct-io flags, DIO_SKIP_DIO_COUNT, which tells
do_blockdev_direct_IO() that it need not worry about incrementing
or decrementing the inode i_dio_count for this caller.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Elliott, Robert (Server Storage) <elliott@hp.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jens Axboe <axboe@fb.com>
---
 fs/block_dev.c     |  3 ++-
 fs/btrfs/inode.c   |  6 +++---
 fs/dax.c           |  4 ++--
 fs/direct-io.c     |  7 +++++--
 fs/ext4/indirect.c |  6 +++---
 fs/ext4/inode.c    |  4 ++--
 fs/inode.c         | 14 --------------
 fs/nfs/direct.c    | 10 +++++-----
 include/linux/fs.h | 29 ++++++++++++++++++++++++++++-
 9 files changed, 50 insertions(+), 33 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 79b4fa3b391d..c7e4163ede87 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -152,7 +152,8 @@ blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, loff_t offset)
 	struct inode *inode = file->f_mapping->host;
 
 	return __blockdev_direct_IO(iocb, inode, I_BDEV(inode), iter, offset,
-				    blkdev_get_block, NULL, NULL, 0);
+				    blkdev_get_block, NULL, NULL,
+				    DIO_SKIP_DIO_COUNT);
 }
 
 int __sync_blockdev(struct block_device *bdev, int wait)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 811576346a92..9b774a0c9ca6 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8129,7 +8129,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 	if (check_direct_IO(BTRFS_I(inode)->root, iocb, iter, offset))
 		return 0;
 
-	atomic_inc(&inode->i_dio_count);
+	inode_dio_begin(inode);
 	smp_mb__after_atomic();
 
 	/*
@@ -8169,7 +8169,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 		current->journal_info = &outstanding_extents;
 	} else if (test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
 				     &BTRFS_I(inode)->runtime_flags)) {
-		inode_dio_done(inode);
+		inode_dio_end(inode);
 		flags = DIO_LOCKING | DIO_SKIP_HOLES;
 		wakeup = false;
 	}
@@ -8188,7 +8188,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 	}
 out:
 	if (wakeup)
-		inode_dio_done(inode);
+		inode_dio_end(inode);
 	if (relock)
 		mutex_lock(&inode->i_mutex);
 
diff --git a/fs/dax.c b/fs/dax.c
index a27846946525..cdde2b354e47 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -209,7 +209,7 @@ ssize_t dax_do_io(struct kiocb *iocb, struct inode *inode,
 	}
 
 	/* Protects against truncate */
-	atomic_inc(&inode->i_dio_count);
+	inode_dio_begin(inode);
 
 	retval = dax_io(inode, iter, pos, end, get_block, &bh);
 
@@ -219,7 +219,7 @@ ssize_t dax_do_io(struct kiocb *iocb, struct inode *inode,
 	if ((retval > 0) && end_io)
 		end_io(iocb, pos, retval, bh.b_private);
 
-	inode_dio_done(inode);
+	inode_dio_end(inode);
  out:
 	return retval;
 }
diff --git a/fs/direct-io.c b/fs/direct-io.c
index c3b560b24a46..745d2342651a 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -253,7 +253,9 @@ static ssize_t dio_complete(struct dio *dio, loff_t offset, ssize_t ret,
 	if (dio->end_io && dio->result)
 		dio->end_io(dio->iocb, offset, transferred, dio->private);
 
-	inode_dio_done(dio->inode);
+	if (!(dio->flags & DIO_SKIP_DIO_COUNT))
+		inode_dio_end(dio->inode);
+
 	if (is_async) {
 		if (dio->rw & WRITE) {
 			int err;
@@ -1195,7 +1197,8 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 	/*
 	 * Will be decremented at I/O completion time.
 	 */
-	atomic_inc(&inode->i_dio_count);
+	if (!(dio->flags & DIO_SKIP_DIO_COUNT))
+		inode_dio_begin(inode);
 
 	retval = 0;
 	sdio.blkbits = blkbits;
diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index 3580629e42d3..958824019509 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -682,11 +682,11 @@ retry:
 		 * via ext4_inode_block_unlocked_dio(). Check inode's state
 		 * while holding extra i_dio_count ref.
 		 */
-		atomic_inc(&inode->i_dio_count);
+		inode_dio_begin(inode);
 		smp_mb();
 		if (unlikely(ext4_test_inode_state(inode,
 						    EXT4_STATE_DIOREAD_LOCK))) {
-			inode_dio_done(inode);
+			inode_dio_end(inode);
 			goto locked;
 		}
 		if (IS_DAX(inode))
@@ -697,7 +697,7 @@ retry:
 						   inode->i_sb->s_bdev, iter,
 						   offset, ext4_get_block, NULL,
 						   NULL, 0);
-		inode_dio_done(inode);
+		inode_dio_end(inode);
 	} else {
 locked:
 		if (IS_DAX(inode))
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 063052e4aa8b..bccec41fb94b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2977,7 +2977,7 @@ static ssize_t ext4_ext_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 	 * overwrite DIO as i_dio_count needs to be incremented under i_mutex.
 	 */
 	if (iov_iter_rw(iter) == WRITE)
-		atomic_inc(&inode->i_dio_count);
+		inode_dio_begin(inode);
 
 	/* If we do a overwrite dio, i_mutex locking can be released */
 	overwrite = *((int *)iocb->private);
@@ -3079,7 +3079,7 @@ static ssize_t ext4_ext_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 
 retake_lock:
 	if (iov_iter_rw(iter) == WRITE)
-		inode_dio_done(inode);
+		inode_dio_end(inode);
 	/* take i_mutex locking again if we do a ovewrite dio */
 	if (overwrite) {
 		up_read(&EXT4_I(inode)->i_data_sem);
diff --git a/fs/inode.c b/fs/inode.c
index 94886f9fbb06..ea37cd17b53f 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1946,20 +1946,6 @@ void inode_dio_wait(struct inode *inode)
 EXPORT_SYMBOL(inode_dio_wait);
 
 /*
- * inode_dio_done - signal finish of a direct I/O requests
- * @inode: inode the direct I/O happens on
- *
- * This is called once we've finished processing a direct I/O request,
- * and is used to wake up callers waiting for direct I/O to be quiesced.
- */
-void inode_dio_done(struct inode *inode)
-{
-	if (atomic_dec_and_test(&inode->i_dio_count))
-		wake_up_bit(&inode->i_state, __I_DIO_WAKEUP);
-}
-EXPORT_SYMBOL(inode_dio_done);
-
-/*
  * inode_set_flags - atomically set some inode flags
  *
  * Note: the caller should be holding i_mutex, or else be sure that
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index ed0e6031be88..b2cbc3a6cdd9 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -386,7 +386,7 @@ static void nfs_direct_complete(struct nfs_direct_req *dreq, bool write)
 	if (write)
 		nfs_zap_mapping(inode, inode->i_mapping);
 
-	inode_dio_done(inode);
+	inode_dio_end(inode);
 
 	if (dreq->iocb) {
 		long res = (long) dreq->error;
@@ -486,7 +486,7 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
 			     &nfs_direct_read_completion_ops);
 	get_dreq(dreq);
 	desc.pg_dreq = dreq;
-	atomic_inc(&inode->i_dio_count);
+	inode_dio_begin(inode);
 
 	while (iov_iter_count(iter)) {
 		struct page **pagevec;
@@ -538,7 +538,7 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
 	 * generic layer handle the completion.
 	 */
 	if (requested_bytes == 0) {
-		inode_dio_done(inode);
+		inode_dio_end(inode);
 		nfs_direct_req_release(dreq);
 		return result < 0 ? result : -EIO;
 	}
@@ -872,7 +872,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 			      &nfs_direct_write_completion_ops);
 	desc.pg_dreq = dreq;
 	get_dreq(dreq);
-	atomic_inc(&inode->i_dio_count);
+	inode_dio_begin(inode);
 
 	NFS_I(inode)->write_io += iov_iter_count(iter);
 	while (iov_iter_count(iter)) {
@@ -928,7 +928,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 	 * generic layer handle the completion.
 	 */
 	if (requested_bytes == 0) {
-		inode_dio_done(inode);
+		inode_dio_end(inode);
 		nfs_direct_req_release(dreq);
 		return result < 0 ? result : -EIO;
 	}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b1d7db28c13c..9055eefa92c7 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2635,6 +2635,9 @@ enum {
 
 	/* filesystem can handle aio writes beyond i_size */
 	DIO_ASYNC_EXTEND = 0x04,
+
+	/* inode/fs/bdev does not need truncate protection */
+	DIO_SKIP_DIO_COUNT = 0x08,
 };
 
 void dio_end_io(struct bio *bio, int error);
@@ -2657,7 +2660,31 @@ static inline ssize_t blockdev_direct_IO(struct kiocb *iocb,
 #endif
 
 void inode_dio_wait(struct inode *inode);
-void inode_dio_done(struct inode *inode);
+
+/*
+ * inode_dio_begin - signal start of a direct I/O requests
+ * @inode: inode the direct I/O happens on
+ *
+ * This is called once we've finished processing a direct I/O request,
+ * and is used to wake up callers waiting for direct I/O to be quiesced.
+ */
+static inline void inode_dio_begin(struct inode *inode)
+{
+	atomic_inc(&inode->i_dio_count);
+}
+
+/*
+ * inode_dio_end - signal finish of a direct I/O requests
+ * @inode: inode the direct I/O happens on
+ *
+ * This is called once we've finished processing a direct I/O request,
+ * and is used to wake up callers waiting for direct I/O to be quiesced.
+ */
+static inline void inode_dio_end(struct inode *inode)
+{
+	if (atomic_dec_and_test(&inode->i_dio_count))
+		wake_up_bit(&inode->i_state, __I_DIO_WAKEUP);
+}
 
 extern void inode_set_flags(struct inode *inode, unsigned int flags,
 			    unsigned int mask);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] direct-io: only inc/dec inode->i_dio_count for file systems
  2015-04-15 23:05       ` Jens Axboe
@ 2015-04-15 23:30         ` Al Viro
  2015-04-15 23:50           ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2015-04-15 23:30 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Dave Chinner, linux-kernel, linux-fsdevel, Andrew Morton,
	Christoph Hellwig, Theodore Ts'o, Elliott,
	Robert (Server Storage)

On Wed, Apr 15, 2015 at 05:05:48PM -0600, Jens Axboe wrote:
> On 04/15/2015 04:56 PM, Al Viro wrote:
> >On Thu, Apr 16, 2015 at 08:36:20AM +1000, Dave Chinner wrote:
> >
> >>Bikeshedding: I think this would be better suited to inode_dio_begin()
> >>and inode_dio_end() because now we are trying to say "this is where
> >>the DIO starts, and this is where it ends". It's not really
> >>"reference counting" interface, we're trying to annotate the
> >>boundaries of where DIO iis protected against truncate....
> >
> >*nod*
> >
> >And while we are at, inode_dio_begin() could be static inline just fine.
> 
> Done (rename and docbook), and inode_dio_{begin.end}() made static inlines.
> 
> v3 against vfs-next attached.

Applied.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] direct-io: only inc/dec inode->i_dio_count for file systems
  2015-04-15 23:30         ` Al Viro
@ 2015-04-15 23:50           ` Jens Axboe
  0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2015-04-15 23:50 UTC (permalink / raw)
  To: Al Viro
  Cc: Dave Chinner, linux-kernel, linux-fsdevel, Andrew Morton,
	Christoph Hellwig, Theodore Ts'o, Elliott,
	Robert (Server Storage)

On 04/15/2015 05:30 PM, Al Viro wrote:
> On Wed, Apr 15, 2015 at 05:05:48PM -0600, Jens Axboe wrote:
>> On 04/15/2015 04:56 PM, Al Viro wrote:
>>> On Thu, Apr 16, 2015 at 08:36:20AM +1000, Dave Chinner wrote:
>>>
>>>> Bikeshedding: I think this would be better suited to inode_dio_begin()
>>>> and inode_dio_end() because now we are trying to say "this is where
>>>> the DIO starts, and this is where it ends". It's not really
>>>> "reference counting" interface, we're trying to annotate the
>>>> boundaries of where DIO iis protected against truncate....
>>>
>>> *nod*
>>>
>>> And while we are at, inode_dio_begin() could be static inline just fine.
>>
>> Done (rename and docbook), and inode_dio_{begin.end}() made static inlines.
>>
>> v3 against vfs-next attached.
>
> Applied.

Awesome, thanks!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-04-15 23:50 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-15 22:01 [PATCH v2] direct-io: only inc/dec inode->i_dio_count for file systems Jens Axboe
2015-04-15 22:01 ` [PATCH 1/3] " Jens Axboe
2015-04-15 22:36   ` Dave Chinner
2015-04-15 22:56     ` Al Viro
2015-04-15 23:05       ` Jens Axboe
2015-04-15 23:30         ` Al Viro
2015-04-15 23:50           ` Jens Axboe
2015-04-15 22:57     ` Jens Axboe
2015-04-15 22:01 ` [PATCH 2/3] btrfs: pass in DIO_SKIP_DIO_COUNT to do_blockdev_direct_IO() Jens Axboe
2015-04-15 22:01 ` [PATCH 3/3] ext4: " Jens Axboe
2015-04-15 22:05 ` [PATCH v2] direct-io: only inc/dec inode->i_dio_count for file systems Al Viro
2015-04-15 22:06   ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.