All of lore.kernel.org
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@primarydata.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH v2 10/12] NFS: Do not serialise O_DIRECT reads and writes
Date: Wed, 22 Jun 2016 17:24:50 +0000	[thread overview]
Message-ID: <97494C37-23D3-44FA-A9B8-1887E17429D9@primarydata.com> (raw)
In-Reply-To: <20160622164715.GB16823@infradead.org>


> On Jun 22, 2016, at 12:47, Christoph Hellwig <hch@infradead.org> wrote:
> 
> On Tue, Jun 21, 2016 at 05:34:51PM -0400, Trond Myklebust wrote:
>> Allow dio requests to be scheduled in parallel, but ensuring that they
>> do not conflict with buffered I/O.
> 
> Can you explain why we care about the full direct / bufferd exclusion
> that no other file system seems do be doing?  I would have expected
> something more like the patch below, which follows the XFS locking
> model 1:1.  This passed xfstests with plain NFSv4, haven't done any pNFS
> testing yet:


If we’re going to worry about write atomicity in the buffered I/O case, then we really should also make sure that O_DIRECT writes are atomic w.r.t. page cache updates too. With this locking model, a buffered read() can race with the O_DIRECT write() and get a mixture of old data and new.

> 
> ---
> From 913189e25fc0e7318f077610c0aa8d5c1071c0c8 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <hch@lst.de>
> Date: Wed, 22 Jun 2016 18:00:24 +0200
> Subject: nfs: do not serialise O_DIRECT reads and writes
> 
> Currently NFS takes the inode lock exclusive for all direct I/O reads and
> writes, which forces users of direct I/O to be unessecarily synchronized.
> We only need the exclusive lock to invalidate the page cache, and could
> otherwise do with a shared lock to protect against buffered writers and
> truncate.  This also adds the shared lock to buffered reads to provide
> Posix synchronized I/O guarantees for reads after synchronous writes,
> although that change shouldn't be nessecary to allow parallel direct I/O.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/nfs/direct.c | 47 +++++++++++++++++++++++++++++++----------------
> fs/nfs/file.c   |  4 +++-
> 2 files changed, 34 insertions(+), 17 deletions(-)
> 
> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> index 979b3c4..8b9c7e95 100644
> --- a/fs/nfs/direct.c
> +++ b/fs/nfs/direct.c
> @@ -581,10 +581,17 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter)
> 	if (!count)
> 		goto out;
> 
> -	inode_lock(inode);
> -	result = nfs_sync_mapping(mapping);
> -	if (result)
> -		goto out_unlock;
> +	if (mapping->nrpages) {
> +		inode_lock(inode);

This is unnecessary now that we have a rw_semaphore. You don’t need to take an exclusive lock in order to serialise w.r.t. new writes, and by doing so you end up serialising all reads if there happens to be pages in the page cache. This is true whether or not those pages are dirty.

> +		result = nfs_sync_mapping(mapping);
> +		if (result) {
> +			inode_unlock(inode);
> +			goto out;
> +		}
> +		downgrade_write(&inode->i_rwsem);
> +	} else {
> +		inode_lock_shared(inode);
> +	}
> 
> 	task_io_account_read(count);
> 
> @@ -609,7 +616,7 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter)
> 	NFS_I(inode)->read_io += count;
> 	result = nfs_direct_read_schedule_iovec(dreq, iter, iocb->ki_pos);
> 
> -	inode_unlock(inode);
> +	inode_unlock_shared(inode);
> 
> 	if (!result) {
> 		result = nfs_direct_wait(dreq);
> @@ -623,7 +630,7 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter)
> out_release:
> 	nfs_direct_req_release(dreq);
> out_unlock:
> -	inode_unlock(inode);
> +	inode_unlock_shared(inode);
> out:
> 	return result;
> }
> @@ -1005,17 +1012,22 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter)
> 	pos = iocb->ki_pos;
> 	end = (pos + iov_iter_count(iter) - 1) >> PAGE_SHIFT;
> 
> -	inode_lock(inode);
> -
> -	result = nfs_sync_mapping(mapping);
> -	if (result)
> -		goto out_unlock;
> -
> 	if (mapping->nrpages) {
> -		result = invalidate_inode_pages2_range(mapping,
> -					pos >> PAGE_SHIFT, end);
> +		inode_lock(inode);
> +		result = nfs_sync_mapping(mapping);
> 		if (result)
> -			goto out_unlock;
> +			goto out_unlock_exclusive;
> +
> +		if (mapping->nrpages) {
> +			result = invalidate_inode_pages2_range(mapping,
> +					pos >> PAGE_SHIFT, end);
> +			if (result)	
> +				goto out_unlock_exclusive;
> +		}
> +
> +		downgrade_write(&inode->i_rwsem);
> +	} else {
> +		inode_lock_shared(inode);
> 	}
> 
> 	task_io_account_write(iov_iter_count(iter));
> @@ -1045,7 +1057,7 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter)
> 					      pos >> PAGE_SHIFT, end);
> 	}
> 
> -	inode_unlock(inode);
> +	inode_unlock_shared(inode);
> 
> 	if (!result) {
> 		result = nfs_direct_wait(dreq);
> @@ -1068,6 +1080,9 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter)
> out_release:
> 	nfs_direct_req_release(dreq);
> out_unlock:
> +	inode_unlock_shared(inode);
> +	return result;
> +out_unlock_exclusive:
> 	inode_unlock(inode);
> 	return result;
> }
> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> index 717a8d6..719296f 100644
> --- a/fs/nfs/file.c
> +++ b/fs/nfs/file.c
> @@ -170,12 +170,14 @@ nfs_file_read(struct kiocb *iocb, struct iov_iter *to)
> 		iocb->ki_filp,
> 		iov_iter_count(to), (unsigned long) iocb->ki_pos);
> 
> -	result = nfs_revalidate_mapping_protected(inode, iocb->ki_filp->f_mapping);
> +	inode_lock_shared(inode);
> +	result = nfs_revalidate_mapping(inode, iocb->ki_filp->f_mapping);
> 	if (!result) {
> 		result = generic_file_read_iter(iocb, to);
> 		if (result > 0)
> 			nfs_add_stats(inode, NFSIOS_NORMALREADBYTES, result);
> 	}
> +	inode_unlock_shared(inode);
> 	return result;
> }
> EXPORT_SYMBOL_GPL(nfs_file_read);
> -- 
> 2.1.4
> 


  reply	other threads:[~2016-06-22 17:25 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-21 21:34 [PATCH v2 01/12] NFS: Don't flush caches for a getattr that races with writeback Trond Myklebust
2016-06-21 21:34 ` [PATCH v2 02/12] NFS: Cache access checks more aggressively Trond Myklebust
2016-06-21 21:34   ` [PATCH v2 03/12] NFS: Cache aggressively when file is open for writing Trond Myklebust
2016-06-21 21:34     ` [PATCH v2 04/12] NFS: Kill NFS_INO_NFS_INO_FLUSHING: it is a performance killer Trond Myklebust
2016-06-21 21:34       ` [PATCH v2 05/12] NFS: writepage of a single page should not be synchronous Trond Myklebust
2016-06-21 21:34         ` [PATCH v2 06/12] NFS: Don't hold the inode lock across fsync() Trond Myklebust
2016-06-21 21:34           ` [PATCH v2 07/12] NFS: Don't call COMMIT in ->releasepage() Trond Myklebust
2016-06-21 21:34             ` [PATCH v2 08/12] NFS: Fix O_DIRECT verifier problems Trond Myklebust
2016-06-21 21:34               ` [PATCH v2 09/12] NFS: Ensure we reset the write verifier 'committed' value on resend Trond Myklebust
2016-06-21 21:34                 ` [PATCH v2 10/12] NFS: Do not serialise O_DIRECT reads and writes Trond Myklebust
2016-06-21 21:34                   ` [PATCH v2 11/12] NFS: Remove inode->i_dio_count from the NFS O_DIRECT code Trond Myklebust
2016-06-21 21:34                     ` [PATCH v2 12/12] NFS: Clean up nfs_direct_complete() Trond Myklebust
2016-06-22 16:43                       ` Christoph Hellwig
2016-06-22 16:42                     ` [PATCH v2 11/12] NFS: Remove inode->i_dio_count from the NFS O_DIRECT code Christoph Hellwig
2016-06-22 16:58                       ` Trond Myklebust
2016-06-23 10:19                         ` Christoph Hellwig
2016-06-22 17:58                     ` Anna Schumaker
2016-06-22 18:06                       ` Trond Myklebust
2016-06-22 18:08                         ` Anna Schumaker
2016-06-22 18:51                           ` Anna Schumaker
2016-06-22 19:42                             ` Trond Myklebust
2016-06-22 16:47                   ` [PATCH v2 10/12] NFS: Do not serialise O_DIRECT reads and writes Christoph Hellwig
2016-06-22 17:24                     ` Trond Myklebust [this message]
2016-06-23 11:00                       ` Christoph Hellwig
2016-06-23 11:00                         ` Christoph Hellwig
2016-06-21 22:25     ` [PATCH v2 03/12] NFS: Cache aggressively when file is open for writing Oleg Drokin
2016-06-22 13:06       ` Trond Myklebust
2016-06-22 16:19         ` Oleg Drokin
2016-06-22 15:48 ` [PATCH v2 01/12] NFS: Don't flush caches for a getattr that races with writeback Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=97494C37-23D3-44FA-A9B8-1887E17429D9@primarydata.com \
    --to=trondmy@primarydata.com \
    --cc=hch@infradead.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.