All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/28] NFS writeback performance patches for v4.8
@ 2016-07-06 22:29 Trond Myklebust
  2016-07-06 22:29 ` [PATCH v4 01/28] NFS: Don't flush caches for a getattr that races with writeback Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw)
  To: linux-nfs

These patches are about improving the NFS I/O.

1) a new locking scheme between O_DIRECT and buffered I/O in order to allow
   parallelisation of O_DIRECT.
2) Remove instances of inode_lock that are preventing parallelism of writes
   and attribute revalidation.
3) Remove a redundant write throttling scheme that was conspiring with
   the FLUSH_COND_STABLE mode to drive writeback performance down.
4) Cache attributes more aggressively when we can assume close to open cache
   consistency.
5) Don't force data sync to disk where it is not needed. If attributes need
   to be up to date, we can usually get by with unstable writes to the server,
   particularly when we're using NFSv4 and can rely on stateids to cause
   operations to fail on a server reboot.
6) Fix a number of bugs around pNFS and attributes. Some (but not all)
   pNFS dialects may need data sync + layoutcommit in order to ensure that
   data and attribute updates are visible.
   In particular fix one bug was causing file size changes to be clobbered
   when layoutcommit was pending.

Trond Myklebust (28):
  NFS: Don't flush caches for a getattr that races with writeback
  NFS: Cache access checks more aggressively
  NFS: Cache aggressively when file is open for writing
  NFS: Kill NFS_INO_NFS_INO_FLUSHING: it is a performance killer
  NFS: writepage of a single page should not be synchronous
  NFS: Don't hold the inode lock across fsync()
  NFS: Don't call COMMIT in ->releasepage()
  pNFS/files: Fix layoutcommit after a commit to DS
  pNFS/flexfiles: Fix layoutcommit after a commit to DS
  pNFS/flexfiles: Clean up calls to pnfs_set_layoutcommit()
  pNFS: Files and flexfiles always need to commit before layoutcommit
  pNFS: Ensure we layoutcommit before revalidating attributes
  pNFS: pnfs_layoutcommit_outstanding() is no longer used when
    !CONFIG_NFS_V4_1
  NFS: Fix O_DIRECT verifier problems
  NFS: Ensure we reset the write verifier 'committed' value on resend.
  NFS: Remove racy size manipulations in O_DIRECT
  NFS Cleanup: move call to generic_write_checks() into fs/nfs/direct.c
  NFS: Move buffered I/O locking into nfs_file_write()
  NFS: Do not serialise O_DIRECT reads and writes
  NFS: Cleanup nfs_direct_complete()
  NFS: Remove redundant waits for O_DIRECT in fsync() and write_begin()
  NFS: Remove unused function nfs_revalidate_mapping_protected()
  NFS: Do not aggressively cache file attributes in the case of O_DIRECT
  NFS: Getattr doesn't require data sync semantics
  NFSv4.2: Fix a race in nfs42_proc_deallocate()
  NFSv4.2: Fix writeback races in nfs4_copy_file_range
  NFSv4.2: llseek(SEEK_HOLE) and llseek(SEEK_DATA) don't require data
    sync
  NFS nfs_vm_page_mkwrite: Don't freeze me, Bro...

 fs/nfs/Makefile                        |   2 +-
 fs/nfs/dir.c                           |  52 +++++++-----
 fs/nfs/direct.c                        |  93 +++++++--------------
 fs/nfs/file.c                          |  96 ++++++---------------
 fs/nfs/filelayout/filelayout.c         |  12 +--
 fs/nfs/flexfilelayout/flexfilelayout.c |  23 +++---
 fs/nfs/inode.c                         | 133 ++++++++++++++---------------
 fs/nfs/internal.h                      |  40 +++++++++
 fs/nfs/io.c                            | 147 +++++++++++++++++++++++++++++++++
 fs/nfs/nfs42proc.c                     |  21 ++++-
 fs/nfs/nfs4file.c                      |  14 +---
 fs/nfs/nfs4xdr.c                       |  11 ++-
 fs/nfs/nfstrace.h                      |   1 -
 fs/nfs/pnfs.c                          |   5 +-
 fs/nfs/pnfs.h                          |   7 --
 fs/nfs/pnfs_nfs.c                      |   7 ++
 fs/nfs/write.c                         |  33 +++++---
 include/linux/nfs_fs.h                 |   3 +-
 18 files changed, 420 insertions(+), 280 deletions(-)
 create mode 100644 fs/nfs/io.c

-- 
2.7.4


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v4 01/28] NFS: Don't flush caches for a getattr that races with writeback
  2016-07-06 22:29 [PATCH v4 00/28] NFS writeback performance patches for v4.8 Trond Myklebust
@ 2016-07-06 22:29 ` Trond Myklebust
  2016-07-06 22:29   ` [PATCH v4 02/28] NFS: Cache access checks more aggressively Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw)
  To: linux-nfs

If there were outstanding writes then chalk up the unexpected change
attribute on the server to them.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/inode.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 52e7d6869e3b..60051e62d3f1 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1729,12 +1729,15 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
 		if (inode->i_version != fattr->change_attr) {
 			dprintk("NFS: change_attr change on server for file %s/%ld\n",
 					inode->i_sb->s_id, inode->i_ino);
-			invalid |= NFS_INO_INVALID_ATTR
-				| NFS_INO_INVALID_DATA
-				| NFS_INO_INVALID_ACCESS
-				| NFS_INO_INVALID_ACL;
-			if (S_ISDIR(inode->i_mode))
-				nfs_force_lookup_revalidate(inode);
+			/* Could it be a race with writeback? */
+			if (nfsi->nrequests == 0) {
+				invalid |= NFS_INO_INVALID_ATTR
+					| NFS_INO_INVALID_DATA
+					| NFS_INO_INVALID_ACCESS
+					| NFS_INO_INVALID_ACL;
+				if (S_ISDIR(inode->i_mode))
+					nfs_force_lookup_revalidate(inode);
+			}
 			inode->i_version = fattr->change_attr;
 		}
 	} else {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 02/28] NFS: Cache access checks more aggressively
  2016-07-06 22:29 ` [PATCH v4 01/28] NFS: Don't flush caches for a getattr that races with writeback Trond Myklebust
@ 2016-07-06 22:29   ` Trond Myklebust
  2016-07-06 22:29     ` [PATCH v4 03/28] NFS: Cache aggressively when file is open for writing Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw)
  To: linux-nfs

If an attribute revalidation fails, then we already know that we'll
zap the access cache. If, OTOH, the inode isn't changing, there should
be no need to eject access calls just because they are old.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/dir.c | 52 +++++++++++++++++++++++++++++++---------------------
 1 file changed, 31 insertions(+), 21 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index aaf7bd0cbae2..210b33636fe4 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2228,21 +2228,37 @@ static struct nfs_access_entry *nfs_access_search_rbtree(struct inode *inode, st
 	return NULL;
 }
 
-static int nfs_access_get_cached(struct inode *inode, struct rpc_cred *cred, struct nfs_access_entry *res)
+static int nfs_access_get_cached(struct inode *inode, struct rpc_cred *cred, struct nfs_access_entry *res, bool may_block)
 {
 	struct nfs_inode *nfsi = NFS_I(inode);
 	struct nfs_access_entry *cache;
-	int err = -ENOENT;
+	bool retry = true;
+	int err;
 
 	spin_lock(&inode->i_lock);
-	if (nfsi->cache_validity & NFS_INO_INVALID_ACCESS)
-		goto out_zap;
-	cache = nfs_access_search_rbtree(inode, cred);
-	if (cache == NULL)
-		goto out;
-	if (!nfs_have_delegated_attributes(inode) &&
-	    !time_in_range_open(jiffies, cache->jiffies, cache->jiffies + nfsi->attrtimeo))
-		goto out_stale;
+	for(;;) {
+		if (nfsi->cache_validity & NFS_INO_INVALID_ACCESS)
+			goto out_zap;
+		cache = nfs_access_search_rbtree(inode, cred);
+		err = -ENOENT;
+		if (cache == NULL)
+			goto out;
+		/* Found an entry, is our attribute cache valid? */
+		if (!nfs_attribute_cache_expired(inode) &&
+		    !(nfsi->cache_validity & NFS_INO_INVALID_ATTR))
+			break;
+		err = -ECHILD;
+		if (!may_block)
+			goto out;
+		if (!retry)
+			goto out_zap;
+		spin_unlock(&inode->i_lock);
+		err = __nfs_revalidate_inode(NFS_SERVER(inode), inode);
+		if (err)
+			return err;
+		spin_lock(&inode->i_lock);
+		retry = false;
+	}
 	res->jiffies = cache->jiffies;
 	res->cred = cache->cred;
 	res->mask = cache->mask;
@@ -2251,12 +2267,6 @@ static int nfs_access_get_cached(struct inode *inode, struct rpc_cred *cred, str
 out:
 	spin_unlock(&inode->i_lock);
 	return err;
-out_stale:
-	rb_erase(&cache->rb_node, &nfsi->access_cache);
-	list_del(&cache->lru);
-	spin_unlock(&inode->i_lock);
-	nfs_access_free_entry(cache);
-	return -ENOENT;
 out_zap:
 	spin_unlock(&inode->i_lock);
 	nfs_access_zap_cache(inode);
@@ -2283,13 +2293,12 @@ static int nfs_access_get_cached_rcu(struct inode *inode, struct rpc_cred *cred,
 		cache = NULL;
 	if (cache == NULL)
 		goto out;
-	if (!nfs_have_delegated_attributes(inode) &&
-	    !time_in_range_open(jiffies, cache->jiffies, cache->jiffies + nfsi->attrtimeo))
+	err = nfs_revalidate_inode_rcu(NFS_SERVER(inode), inode);
+	if (err)
 		goto out;
 	res->jiffies = cache->jiffies;
 	res->cred = cache->cred;
 	res->mask = cache->mask;
-	err = 0;
 out:
 	rcu_read_unlock();
 	return err;
@@ -2378,18 +2387,19 @@ EXPORT_SYMBOL_GPL(nfs_access_set_mask);
 static int nfs_do_access(struct inode *inode, struct rpc_cred *cred, int mask)
 {
 	struct nfs_access_entry cache;
+	bool may_block = (mask & MAY_NOT_BLOCK) == 0;
 	int status;
 
 	trace_nfs_access_enter(inode);
 
 	status = nfs_access_get_cached_rcu(inode, cred, &cache);
 	if (status != 0)
-		status = nfs_access_get_cached(inode, cred, &cache);
+		status = nfs_access_get_cached(inode, cred, &cache, may_block);
 	if (status == 0)
 		goto out_cached;
 
 	status = -ECHILD;
-	if (mask & MAY_NOT_BLOCK)
+	if (!may_block)
 		goto out;
 
 	/* Be clever: ask server to check for all possible rights */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 03/28] NFS: Cache aggressively when file is open for writing
  2016-07-06 22:29   ` [PATCH v4 02/28] NFS: Cache access checks more aggressively Trond Myklebust
@ 2016-07-06 22:29     ` Trond Myklebust
  2016-07-06 22:29       ` [PATCH v4 04/28] NFS: Kill NFS_INO_NFS_INO_FLUSHING: it is a performance killer Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw)
  To: linux-nfs

Unless the user is using file locking, we must assume close-to-open
cache consistency when the file is open for writing. Adjust the
caching algorithm so that it does not clear the cache on out-of-order
writes and/or attribute revalidations.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/file.c  | 13 ++----------
 fs/nfs/inode.c | 62 +++++++++++++++++++++++++++++++++++++++++-----------------
 2 files changed, 46 insertions(+), 29 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 717a8d6af52d..2d39d9f9da7d 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -780,11 +780,6 @@ do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
 }
 
 static int
-is_time_granular(struct timespec *ts) {
-	return ((ts->tv_sec == 0) && (ts->tv_nsec <= 1000));
-}
-
-static int
 do_setlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
 {
 	struct inode *inode = filp->f_mapping->host;
@@ -817,12 +812,8 @@ do_setlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
 	 * This makes locking act as a cache coherency point.
 	 */
 	nfs_sync_mapping(filp->f_mapping);
-	if (!NFS_PROTO(inode)->have_delegation(inode, FMODE_READ)) {
-		if (is_time_granular(&NFS_SERVER(inode)->time_delta))
-			__nfs_revalidate_inode(NFS_SERVER(inode), inode);
-		else
-			nfs_zap_caches(inode);
-	}
+	if (!NFS_PROTO(inode)->have_delegation(inode, FMODE_READ))
+		nfs_zap_mapping(inode, filp->f_mapping);
 out:
 	return status;
 }
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 60051e62d3f1..4e65a5a8a01b 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -878,7 +878,10 @@ void nfs_inode_attach_open_context(struct nfs_open_context *ctx)
 	struct nfs_inode *nfsi = NFS_I(inode);
 
 	spin_lock(&inode->i_lock);
-	list_add(&ctx->list, &nfsi->open_files);
+	if (ctx->mode & FMODE_WRITE)
+		list_add(&ctx->list, &nfsi->open_files);
+	else
+		list_add_tail(&ctx->list, &nfsi->open_files);
 	spin_unlock(&inode->i_lock);
 }
 EXPORT_SYMBOL_GPL(nfs_inode_attach_open_context);
@@ -1215,6 +1218,25 @@ int nfs_revalidate_mapping_protected(struct inode *inode, struct address_space *
 	return __nfs_revalidate_mapping(inode, mapping, true);
 }
 
+static bool nfs_file_has_writers(struct nfs_inode *nfsi)
+{
+	struct inode *inode = &nfsi->vfs_inode;
+
+	assert_spin_locked(&inode->i_lock);
+
+	if (!S_ISREG(inode->i_mode))
+		return false;
+	if (list_empty(&nfsi->open_files))
+		return false;
+	/* Note: This relies on nfsi->open_files being ordered with writers
+	 *       being placed at the head of the list.
+	 *       See nfs_inode_attach_open_context()
+	 */
+	return (list_first_entry(&nfsi->open_files,
+			struct nfs_open_context,
+			list)->mode & FMODE_WRITE) == FMODE_WRITE;
+}
+
 static unsigned long nfs_wcc_update_inode(struct inode *inode, struct nfs_fattr *fattr)
 {
 	struct nfs_inode *nfsi = NFS_I(inode);
@@ -1279,22 +1301,24 @@ static int nfs_check_inode_attributes(struct inode *inode, struct nfs_fattr *fat
 	if ((fattr->valid & NFS_ATTR_FATTR_TYPE) && (inode->i_mode & S_IFMT) != (fattr->mode & S_IFMT))
 		return -EIO;
 
-	if ((fattr->valid & NFS_ATTR_FATTR_CHANGE) != 0 &&
-			inode->i_version != fattr->change_attr)
-		invalid |= NFS_INO_INVALID_ATTR|NFS_INO_REVAL_PAGECACHE;
+	if (!nfs_file_has_writers(nfsi)) {
+		/* Verify a few of the more important attributes */
+		if ((fattr->valid & NFS_ATTR_FATTR_CHANGE) != 0 && inode->i_version != fattr->change_attr)
+			invalid |= NFS_INO_INVALID_ATTR | NFS_INO_REVAL_PAGECACHE;
 
-	/* Verify a few of the more important attributes */
-	if ((fattr->valid & NFS_ATTR_FATTR_MTIME) && !timespec_equal(&inode->i_mtime, &fattr->mtime))
-		invalid |= NFS_INO_INVALID_ATTR;
+		if ((fattr->valid & NFS_ATTR_FATTR_MTIME) && !timespec_equal(&inode->i_mtime, &fattr->mtime))
+			invalid |= NFS_INO_INVALID_ATTR;
 
-	if (fattr->valid & NFS_ATTR_FATTR_SIZE) {
-		cur_size = i_size_read(inode);
-		new_isize = nfs_size_to_loff_t(fattr->size);
-		if (cur_size != new_isize)
-			invalid |= NFS_INO_INVALID_ATTR|NFS_INO_REVAL_PAGECACHE;
+		if ((fattr->valid & NFS_ATTR_FATTR_CTIME) && !timespec_equal(&inode->i_ctime, &fattr->ctime))
+			invalid |= NFS_INO_INVALID_ATTR;
+
+		if (fattr->valid & NFS_ATTR_FATTR_SIZE) {
+			cur_size = i_size_read(inode);
+			new_isize = nfs_size_to_loff_t(fattr->size);
+			if (cur_size != new_isize)
+				invalid |= NFS_INO_INVALID_ATTR|NFS_INO_REVAL_PAGECACHE;
+		}
 	}
-	if (nfsi->nrequests != 0)
-		invalid &= ~NFS_INO_REVAL_PAGECACHE;
 
 	/* Have any file permissions changed? */
 	if ((fattr->valid & NFS_ATTR_FATTR_MODE) && (inode->i_mode & S_IALLUGO) != (fattr->mode & S_IALLUGO))
@@ -1526,7 +1550,7 @@ EXPORT_SYMBOL_GPL(nfs_refresh_inode);
 
 static int nfs_post_op_update_inode_locked(struct inode *inode, struct nfs_fattr *fattr)
 {
-	unsigned long invalid = NFS_INO_INVALID_ATTR|NFS_INO_REVAL_PAGECACHE;
+	unsigned long invalid = NFS_INO_INVALID_ATTR;
 
 	/*
 	 * Don't revalidate the pagecache if we hold a delegation, but do
@@ -1675,6 +1699,7 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
 	unsigned long invalid = 0;
 	unsigned long now = jiffies;
 	unsigned long save_cache_validity;
+	bool have_writers = nfs_file_has_writers(nfsi);
 	bool cache_revalidated = true;
 
 	dfprintk(VFS, "NFS: %s(%s/%lu fh_crc=0x%08x ct=%d info=0x%x)\n",
@@ -1730,7 +1755,7 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
 			dprintk("NFS: change_attr change on server for file %s/%ld\n",
 					inode->i_sb->s_id, inode->i_ino);
 			/* Could it be a race with writeback? */
-			if (nfsi->nrequests == 0) {
+			if (!have_writers) {
 				invalid |= NFS_INO_INVALID_ATTR
 					| NFS_INO_INVALID_DATA
 					| NFS_INO_INVALID_ACCESS
@@ -1770,9 +1795,10 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
 		if (new_isize != cur_isize) {
 			/* Do we perhaps have any outstanding writes, or has
 			 * the file grown beyond our last write? */
-			if ((nfsi->nrequests == 0) || new_isize > cur_isize) {
+			if (nfsi->nrequests == 0 || new_isize > cur_isize) {
 				i_size_write(inode, new_isize);
-				invalid |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA;
+				if (!have_writers)
+					invalid |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA;
 			}
 			dprintk("NFS: isize change on server for file %s/%ld "
 					"(%Ld to %Ld)\n",
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 04/28] NFS: Kill NFS_INO_NFS_INO_FLUSHING: it is a performance killer
  2016-07-06 22:29     ` [PATCH v4 03/28] NFS: Cache aggressively when file is open for writing Trond Myklebust
@ 2016-07-06 22:29       ` Trond Myklebust
  2016-07-06 22:29         ` [PATCH v4 05/28] NFS: writepage of a single page should not be synchronous Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw)
  To: linux-nfs

filemap_datawrite() and friends already deal just fine with livelock.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/file.c          |  8 --------
 fs/nfs/nfstrace.h      |  1 -
 fs/nfs/write.c         | 11 -----------
 include/linux/nfs_fs.h |  1 -
 4 files changed, 21 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 2d39d9f9da7d..29d7477a62e8 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -360,14 +360,6 @@ static int nfs_write_begin(struct file *file, struct address_space *mapping,
 
 start:
 	/*
-	 * Prevent starvation issues if someone is doing a consistency
-	 * sync-to-disk
-	 */
-	ret = wait_on_bit_action(&NFS_I(mapping->host)->flags, NFS_INO_FLUSHING,
-				 nfs_wait_bit_killable, TASK_KILLABLE);
-	if (ret)
-		return ret;
-	/*
 	 * Wait for O_DIRECT to complete
 	 */
 	inode_dio_wait(mapping->host);
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index 0b9e5cc9a747..fe80a1c26340 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -37,7 +37,6 @@
 			{ 1 << NFS_INO_ADVISE_RDPLUS, "ADVISE_RDPLUS" }, \
 			{ 1 << NFS_INO_STALE, "STALE" }, \
 			{ 1 << NFS_INO_INVALIDATING, "INVALIDATING" }, \
-			{ 1 << NFS_INO_FLUSHING, "FLUSHING" }, \
 			{ 1 << NFS_INO_FSCACHE, "FSCACHE" }, \
 			{ 1 << NFS_INO_LAYOUTCOMMIT, "NEED_LAYOUTCOMMIT" }, \
 			{ 1 << NFS_INO_LAYOUTCOMMITTING, "LAYOUTCOMMIT" })
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index e1c74d3db64d..980d44f3a84c 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -657,16 +657,9 @@ static int nfs_writepages_callback(struct page *page, struct writeback_control *
 int nfs_writepages(struct address_space *mapping, struct writeback_control *wbc)
 {
 	struct inode *inode = mapping->host;
-	unsigned long *bitlock = &NFS_I(inode)->flags;
 	struct nfs_pageio_descriptor pgio;
 	int err;
 
-	/* Stop dirtying of new pages while we sync */
-	err = wait_on_bit_lock_action(bitlock, NFS_INO_FLUSHING,
-			nfs_wait_bit_killable, TASK_KILLABLE);
-	if (err)
-		goto out_err;
-
 	nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGES);
 
 	nfs_pageio_init_write(&pgio, inode, wb_priority(wbc), false,
@@ -674,10 +667,6 @@ int nfs_writepages(struct address_space *mapping, struct writeback_control *wbc)
 	err = write_cache_pages(mapping, wbc, nfs_writepages_callback, &pgio);
 	nfs_pageio_complete(&pgio);
 
-	clear_bit_unlock(NFS_INO_FLUSHING, bitlock);
-	smp_mb__after_atomic();
-	wake_up_bit(bitlock, NFS_INO_FLUSHING);
-
 	if (err < 0)
 		goto out_err;
 	err = pgio.pg_error;
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index d71278c3c5bd..120dd04b553c 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -205,7 +205,6 @@ struct nfs_inode {
 #define NFS_INO_STALE		(1)		/* possible stale inode */
 #define NFS_INO_ACL_LRU_SET	(2)		/* Inode is on the LRU list */
 #define NFS_INO_INVALIDATING	(3)		/* inode is being invalidated */
-#define NFS_INO_FLUSHING	(4)		/* inode is flushing out data */
 #define NFS_INO_FSCACHE		(5)		/* inode can be cached by FS-Cache */
 #define NFS_INO_FSCACHE_LOCK	(6)		/* FS-Cache cookie management lock */
 #define NFS_INO_LAYOUTCOMMIT	(9)		/* layoutcommit required */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 05/28] NFS: writepage of a single page should not be synchronous
  2016-07-06 22:29       ` [PATCH v4 04/28] NFS: Kill NFS_INO_NFS_INO_FLUSHING: it is a performance killer Trond Myklebust
@ 2016-07-06 22:29         ` Trond Myklebust
  2016-07-06 22:29           ` [PATCH v4 06/28] NFS: Don't hold the inode lock across fsync() Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw)
  To: linux-nfs

It is almost always better to wait for more so that we can issue a
bulk commit.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/write.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 980d44f3a84c..b13d48881d3a 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -625,7 +625,7 @@ static int nfs_writepage_locked(struct page *page,
 	int err;
 
 	nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGE);
-	nfs_pageio_init_write(&pgio, inode, wb_priority(wbc),
+	nfs_pageio_init_write(&pgio, inode, 0,
 				false, &nfs_async_write_completion_ops);
 	err = nfs_do_writepage(page, wbc, &pgio, launder);
 	nfs_pageio_complete(&pgio);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 06/28] NFS: Don't hold the inode lock across fsync()
  2016-07-06 22:29         ` [PATCH v4 05/28] NFS: writepage of a single page should not be synchronous Trond Myklebust
@ 2016-07-06 22:29           ` Trond Myklebust
  2016-07-06 22:29             ` [PATCH v4 07/28] NFS: Don't call COMMIT in ->releasepage() Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw)
  To: linux-nfs

Commits are no longer required to be serialised.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/file.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 29d7477a62e8..249262b6bcbe 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -277,11 +277,9 @@ nfs_file_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 		ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
 		if (ret != 0)
 			break;
-		inode_lock(inode);
 		ret = nfs_file_fsync_commit(file, start, end, datasync);
 		if (!ret)
 			ret = pnfs_sync_inode(inode, !!datasync);
-		inode_unlock(inode);
 		/*
 		 * If nfs_file_fsync_commit detected a server reboot, then
 		 * resend all dirty pages that might have been covered by
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 07/28] NFS: Don't call COMMIT in ->releasepage()
  2016-07-06 22:29           ` [PATCH v4 06/28] NFS: Don't hold the inode lock across fsync() Trond Myklebust
@ 2016-07-06 22:29             ` Trond Myklebust
  2016-07-06 22:29               ` [PATCH v4 08/28] pNFS/files: Fix layoutcommit after a commit to DS Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw)
  To: linux-nfs

While COMMIT has the potential to free up a lot of memory that is being
taken by unstable writes, it isn't guaranteed to free up this particular
page. Also, calling fsync() on the server is expensive and so we want to
do it in a more controlled fashion, rather than have it triggered at
random by the VM.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/file.c | 23 -----------------------
 1 file changed, 23 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 249262b6bcbe..df4dd8e7e62e 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -460,31 +460,8 @@ static void nfs_invalidate_page(struct page *page, unsigned int offset,
  */
 static int nfs_release_page(struct page *page, gfp_t gfp)
 {
-	struct address_space *mapping = page->mapping;
-
 	dfprintk(PAGECACHE, "NFS: release_page(%p)\n", page);
 
-	/* Always try to initiate a 'commit' if relevant, but only
-	 * wait for it if the caller allows blocking.  Even then,
-	 * only wait 1 second and only if the 'bdi' is not congested.
-	 * Waiting indefinitely can cause deadlocks when the NFS
-	 * server is on this machine, when a new TCP connection is
-	 * needed and in other rare cases.  There is no particular
-	 * need to wait extensively here.  A short wait has the
-	 * benefit that someone else can worry about the freezer.
-	 */
-	if (mapping) {
-		struct nfs_server *nfss = NFS_SERVER(mapping->host);
-		nfs_commit_inode(mapping->host, 0);
-		if (gfpflags_allow_blocking(gfp) &&
-		    !bdi_write_congested(&nfss->backing_dev_info)) {
-			wait_on_page_bit_killable_timeout(page, PG_private,
-							  HZ);
-			if (PagePrivate(page))
-				set_bdi_congested(&nfss->backing_dev_info,
-						  BLK_RW_ASYNC);
-		}
-	}
 	/* If PagePrivate() is set, then the page is not freeable */
 	if (PagePrivate(page))
 		return 0;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 08/28] pNFS/files: Fix layoutcommit after a commit to DS
  2016-07-06 22:29             ` [PATCH v4 07/28] NFS: Don't call COMMIT in ->releasepage() Trond Myklebust
@ 2016-07-06 22:29               ` Trond Myklebust
  2016-07-06 22:29                 ` [PATCH v4 09/28] pNFS/flexfiles: " Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw)
  To: linux-nfs

According to the errata
https://www.rfc-editor.org/errata_search.php?rfc=5661&eid=2751
we should always send layout commit after a commit to DS.

Fixes: bc7d4b8fd091 ("nfs/filelayout: set layoutcommit...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/filelayout/filelayout.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index aa59757389dc..b4c1407e8fe4 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -375,8 +375,7 @@ static int filelayout_commit_done_cb(struct rpc_task *task,
 		return -EAGAIN;
 	}
 
-	if (data->verf.committed == NFS_UNSTABLE)
-		pnfs_set_layoutcommit(data->inode, data->lseg, data->lwb);
+	pnfs_set_layoutcommit(data->inode, data->lseg, data->lwb);
 
 	return 0;
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 09/28] pNFS/flexfiles: Fix layoutcommit after a commit to DS
  2016-07-06 22:29               ` [PATCH v4 08/28] pNFS/files: Fix layoutcommit after a commit to DS Trond Myklebust
@ 2016-07-06 22:29                 ` Trond Myklebust
  2016-07-06 22:29                   ` [PATCH v4 10/28] pNFS/flexfiles: Clean up calls to pnfs_set_layoutcommit() Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw)
  To: linux-nfs

We should always do a layoutcommit after commit to DS, except if
the layout segment we're using has set FF_FLAGS_NO_LAYOUTCOMMIT.

Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/flexfilelayout/flexfilelayout.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 0e8018bc9880..2689c9e9dc3c 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -1530,8 +1530,7 @@ static int ff_layout_commit_done_cb(struct rpc_task *task,
 		return -EAGAIN;
 	}
 
-	if (data->verf.committed == NFS_UNSTABLE
-	    && ff_layout_need_layoutcommit(data->lseg))
+	if (ff_layout_need_layoutcommit(data->lseg))
 		pnfs_set_layoutcommit(data->inode, data->lseg, data->lwb);
 
 	return 0;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 10/28] pNFS/flexfiles: Clean up calls to pnfs_set_layoutcommit()
  2016-07-06 22:29                 ` [PATCH v4 09/28] pNFS/flexfiles: " Trond Myklebust
@ 2016-07-06 22:29                   ` Trond Myklebust
  2016-07-06 22:29                     ` [PATCH v4 11/28] pNFS: Files and flexfiles always need to commit before layoutcommit Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw)
  To: linux-nfs

Let's just have one place where we check ff_layout_need_layoutcommit().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/flexfilelayout/flexfilelayout.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 2689c9e9dc3c..14f2ed3f1a5b 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -1325,15 +1325,16 @@ ff_layout_need_layoutcommit(struct pnfs_layout_segment *lseg)
  * we always send layoutcommit after DS writes.
  */
 static void
-ff_layout_set_layoutcommit(struct nfs_pgio_header *hdr)
+ff_layout_set_layoutcommit(struct inode *inode,
+		struct pnfs_layout_segment *lseg,
+		loff_t end_offset)
 {
-	if (!ff_layout_need_layoutcommit(hdr->lseg))
+	if (!ff_layout_need_layoutcommit(lseg))
 		return;
 
-	pnfs_set_layoutcommit(hdr->inode, hdr->lseg,
-			hdr->mds_offset + hdr->res.count);
-	dprintk("%s inode %lu pls_end_pos %lu\n", __func__, hdr->inode->i_ino,
-		(unsigned long) NFS_I(hdr->inode)->layout->plh_lwb);
+	pnfs_set_layoutcommit(inode, lseg, end_offset);
+	dprintk("%s inode %lu pls_end_pos %llu\n", __func__, inode->i_ino,
+		(unsigned long long) NFS_I(inode)->layout->plh_lwb);
 }
 
 static bool
@@ -1494,7 +1495,8 @@ static int ff_layout_write_done_cb(struct rpc_task *task,
 
 	if (hdr->res.verf->committed == NFS_FILE_SYNC ||
 	    hdr->res.verf->committed == NFS_DATA_SYNC)
-		ff_layout_set_layoutcommit(hdr);
+		ff_layout_set_layoutcommit(hdr->inode, hdr->lseg,
+				hdr->mds_offset + (loff_t)hdr->res.count);
 
 	/* zero out fattr since we don't care DS attr at all */
 	hdr->fattr.valid = 0;
@@ -1530,8 +1532,7 @@ static int ff_layout_commit_done_cb(struct rpc_task *task,
 		return -EAGAIN;
 	}
 
-	if (ff_layout_need_layoutcommit(data->lseg))
-		pnfs_set_layoutcommit(data->inode, data->lseg, data->lwb);
+	ff_layout_set_layoutcommit(data->inode, data->lseg, data->lwb);
 
 	return 0;
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 11/28] pNFS: Files and flexfiles always need to commit before layoutcommit
  2016-07-06 22:29                   ` [PATCH v4 10/28] pNFS/flexfiles: Clean up calls to pnfs_set_layoutcommit() Trond Myklebust
@ 2016-07-06 22:29                     ` Trond Myklebust
  2016-07-06 22:29                       ` [PATCH v4 12/28] pNFS: Ensure we layoutcommit before revalidating attributes Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw)
  To: linux-nfs

So ensure that we mark the layout for commit once the write is done,
and then ensure that the commit to ds is finished before sending
layoutcommit.

Note that by doing this, we're able to optimise away the commit
for the case of servers that don't need layoutcommit in order to
return updated attributes.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/filelayout/filelayout.c         |  9 ++++++---
 fs/nfs/flexfilelayout/flexfilelayout.c |  7 +++++--
 fs/nfs/nfs4xdr.c                       | 11 ++++++++---
 fs/nfs/pnfs.c                          |  5 ++++-
 fs/nfs/pnfs_nfs.c                      |  7 +++++++
 5 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index b4c1407e8fe4..25bd91a6e088 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -255,13 +255,16 @@ static int filelayout_read_done_cb(struct rpc_task *task,
 static void
 filelayout_set_layoutcommit(struct nfs_pgio_header *hdr)
 {
+	loff_t end_offs = 0;
 
 	if (FILELAYOUT_LSEG(hdr->lseg)->commit_through_mds ||
-	    hdr->res.verf->committed != NFS_DATA_SYNC)
+	    hdr->res.verf->committed == NFS_FILE_SYNC)
 		return;
+	if (hdr->res.verf->committed == NFS_DATA_SYNC)
+		end_offs = hdr->mds_offset + (loff_t)hdr->res.count;
 
-	pnfs_set_layoutcommit(hdr->inode, hdr->lseg,
-			hdr->mds_offset + hdr->res.count);
+	/* Note: if the write is unstable, don't set end_offs until commit */
+	pnfs_set_layoutcommit(hdr->inode, hdr->lseg, end_offs);
 	dprintk("%s inode %lu pls_end_pos %lu\n", __func__, hdr->inode->i_ino,
 		(unsigned long) NFS_I(hdr->inode)->layout->plh_lwb);
 }
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 14f2ed3f1a5b..e6206eaf2bdf 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -1470,6 +1470,7 @@ static void ff_layout_read_release(void *data)
 static int ff_layout_write_done_cb(struct rpc_task *task,
 				struct nfs_pgio_header *hdr)
 {
+	loff_t end_offs = 0;
 	int err;
 
 	trace_nfs4_pnfs_write(hdr, task->tk_status);
@@ -1495,8 +1496,10 @@ static int ff_layout_write_done_cb(struct rpc_task *task,
 
 	if (hdr->res.verf->committed == NFS_FILE_SYNC ||
 	    hdr->res.verf->committed == NFS_DATA_SYNC)
-		ff_layout_set_layoutcommit(hdr->inode, hdr->lseg,
-				hdr->mds_offset + (loff_t)hdr->res.count);
+		end_offs = hdr->mds_offset + (loff_t)hdr->res.count;
+
+	/* Note: if the write is unstable, don't set end_offs until commit */
+	ff_layout_set_layoutcommit(hdr->inode, hdr->lseg, end_offs);
 
 	/* zero out fattr since we don't care DS attr at all */
 	hdr->fattr.valid = 0;
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 661e753fe1c9..7bd3a5c09d31 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -1985,9 +1985,14 @@ encode_layoutcommit(struct xdr_stream *xdr,
 	p = xdr_encode_hyper(p, args->lastbytewritten + 1);	/* length */
 	*p = cpu_to_be32(0); /* reclaim */
 	encode_nfs4_stateid(xdr, &args->stateid);
-	p = reserve_space(xdr, 20);
-	*p++ = cpu_to_be32(1); /* newoffset = TRUE */
-	p = xdr_encode_hyper(p, args->lastbytewritten);
+	if (args->lastbytewritten != U64_MAX) {
+		p = reserve_space(xdr, 20);
+		*p++ = cpu_to_be32(1); /* newoffset = TRUE */
+		p = xdr_encode_hyper(p, args->lastbytewritten);
+	} else {
+		p = reserve_space(xdr, 12);
+		*p++ = cpu_to_be32(0); /* newoffset = FALSE */
+	}
 	*p++ = cpu_to_be32(0); /* Never send time_modify_changed */
 	*p++ = cpu_to_be32(NFS_SERVER(args->inode)->pnfs_curr_ld->id);/* type */
 
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 0c7e0d45a4de..62553182514e 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -2378,7 +2378,10 @@ pnfs_layoutcommit_inode(struct inode *inode, bool sync)
 	nfs_fattr_init(&data->fattr);
 	data->args.bitmask = NFS_SERVER(inode)->cache_consistency_bitmask;
 	data->res.fattr = &data->fattr;
-	data->args.lastbytewritten = end_pos - 1;
+	if (end_pos != 0)
+		data->args.lastbytewritten = end_pos - 1;
+	else
+		data->args.lastbytewritten = U64_MAX;
 	data->res.server = NFS_SERVER(inode);
 
 	if (ld->prepare_layoutcommit) {
diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c
index 0dfc476da3e1..0d10cc280a23 100644
--- a/fs/nfs/pnfs_nfs.c
+++ b/fs/nfs/pnfs_nfs.c
@@ -932,6 +932,13 @@ EXPORT_SYMBOL_GPL(pnfs_layout_mark_request_commit);
 int
 pnfs_nfs_generic_sync(struct inode *inode, bool datasync)
 {
+	int ret;
+
+	if (!pnfs_layoutcommit_outstanding(inode))
+		return 0;
+	ret = nfs_commit_inode(inode, FLUSH_SYNC);
+	if (ret < 0)
+		return ret;
 	if (datasync)
 		return 0;
 	return pnfs_layoutcommit_inode(inode, true);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 12/28] pNFS: Ensure we layoutcommit before revalidating attributes
  2016-07-06 22:29                     ` [PATCH v4 11/28] pNFS: Files and flexfiles always need to commit before layoutcommit Trond Myklebust
@ 2016-07-06 22:29                       ` Trond Myklebust
  2016-07-06 22:29                         ` [PATCH v4 13/28] pNFS: pnfs_layoutcommit_outstanding() is no longer used when !CONFIG_NFS_V4_1 Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw)
  To: linux-nfs

If we need to update the cached attributes, then we'd better make
sure that we also layoutcommit first. Otherwise, the server may have stale
attributes.

Prior to this patch, the revalidation code tried to "fix" this problem by
simply disabling attributes that would be affected by the layoutcommit.
That approach breaks nfs_writeback_check_extend(), leading to a file size
corruption.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/inode.c | 23 +++++++----------------
 1 file changed, 7 insertions(+), 16 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 4e65a5a8a01b..6c0618eb5d57 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -974,6 +974,13 @@ __nfs_revalidate_inode(struct nfs_server *server, struct inode *inode)
 	if (NFS_STALE(inode))
 		goto out;
 
+	/* pNFS: Attributes aren't updated until we layoutcommit */
+	if (S_ISREG(inode->i_mode)) {
+		status = pnfs_sync_inode(inode, false);
+		if (status)
+			goto out;
+	}
+
 	status = -ENOMEM;
 	fattr = nfs_alloc_fattr();
 	if (fattr == NULL)
@@ -1493,28 +1500,12 @@ static int nfs_inode_attrs_need_update(const struct inode *inode, const struct n
 		((long)nfsi->attr_gencount - (long)nfs_read_attr_generation_counter() > 0);
 }
 
-/*
- * Don't trust the change_attribute, mtime, ctime or size if
- * a pnfs LAYOUTCOMMIT is outstanding
- */
-static void nfs_inode_attrs_handle_layoutcommit(struct inode *inode,
-		struct nfs_fattr *fattr)
-{
-	if (pnfs_layoutcommit_outstanding(inode))
-		fattr->valid &= ~(NFS_ATTR_FATTR_CHANGE |
-				NFS_ATTR_FATTR_MTIME |
-				NFS_ATTR_FATTR_CTIME |
-				NFS_ATTR_FATTR_SIZE);
-}
-
 static int nfs_refresh_inode_locked(struct inode *inode, struct nfs_fattr *fattr)
 {
 	int ret;
 
 	trace_nfs_refresh_inode_enter(inode);
 
-	nfs_inode_attrs_handle_layoutcommit(inode, fattr);
-
 	if (nfs_inode_attrs_need_update(inode, fattr))
 		ret = nfs_update_inode(inode, fattr);
 	else
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 13/28] pNFS: pnfs_layoutcommit_outstanding() is no longer used when !CONFIG_NFS_V4_1
  2016-07-06 22:29                       ` [PATCH v4 12/28] pNFS: Ensure we layoutcommit before revalidating attributes Trond Myklebust
@ 2016-07-06 22:29                         ` Trond Myklebust
  2016-07-06 22:29                           ` [PATCH v4 14/28] NFS: Fix O_DIRECT verifier problems Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw)
  To: linux-nfs

Cleanup...

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/pnfs.h | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index b21bd0bee784..d6be5299a55a 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -716,13 +716,6 @@ pnfs_use_threshold(struct nfs4_threshold **dst, struct nfs4_threshold *src,
 	return false;
 }
 
-static inline bool
-pnfs_layoutcommit_outstanding(struct inode *inode)
-{
-	return false;
-}
-
-
 static inline struct nfs4_threshold *pnfs_mdsthreshold_alloc(void)
 {
 	return NULL;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 14/28] NFS: Fix O_DIRECT verifier problems
  2016-07-06 22:29                         ` [PATCH v4 13/28] pNFS: pnfs_layoutcommit_outstanding() is no longer used when !CONFIG_NFS_V4_1 Trond Myklebust
@ 2016-07-06 22:29                           ` Trond Myklebust
  2016-07-06 22:29                             ` [PATCH v4 15/28] NFS: Ensure we reset the write verifier 'committed' value on resend Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw)
  To: linux-nfs

We should not be interested in looking at the value of the stable field,
since that could take any value.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/direct.c   | 10 ++++++++--
 fs/nfs/internal.h |  7 +++++++
 fs/nfs/write.c    |  2 +-
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 979b3c4dee6a..d6d43b5eafb3 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -196,6 +196,12 @@ static void nfs_direct_set_hdr_verf(struct nfs_direct_req *dreq,
 	WARN_ON_ONCE(verfp->committed < 0);
 }
 
+static int nfs_direct_cmp_verf(const struct nfs_writeverf *v1,
+		const struct nfs_writeverf *v2)
+{
+	return nfs_write_verifier_cmp(&v1->verifier, &v2->verifier);
+}
+
 /*
  * nfs_direct_cmp_hdr_verf - compare verifier for pgio header
  * @dreq - direct request possibly spanning multiple servers
@@ -215,7 +221,7 @@ static int nfs_direct_set_or_cmp_hdr_verf(struct nfs_direct_req *dreq,
 		nfs_direct_set_hdr_verf(dreq, hdr);
 		return 0;
 	}
-	return memcmp(verfp, &hdr->verf, sizeof(struct nfs_writeverf));
+	return nfs_direct_cmp_verf(verfp, &hdr->verf);
 }
 
 /*
@@ -238,7 +244,7 @@ static int nfs_direct_cmp_commit_data_verf(struct nfs_direct_req *dreq,
 	if (verfp->committed < 0)
 		return 1;
 
-	return memcmp(verfp, &data->verf, sizeof(struct nfs_writeverf));
+	return nfs_direct_cmp_verf(verfp, &data->verf);
 }
 
 /**
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 5154fa65a2f2..150a8eb0f323 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -506,6 +506,13 @@ extern int nfs_migrate_page(struct address_space *,
 #define nfs_migrate_page NULL
 #endif
 
+static inline int
+nfs_write_verifier_cmp(const struct nfs_write_verifier *v1,
+		const struct nfs_write_verifier *v2)
+{
+	return memcmp(v1->data, v2->data, sizeof(v1->data));
+}
+
 /* unlink.c */
 extern struct rpc_task *
 nfs_async_rename(struct inode *old_dir, struct inode *new_dir,
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index b13d48881d3a..3087fb6f1983 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1789,7 +1789,7 @@ static void nfs_commit_release_pages(struct nfs_commit_data *data)
 
 		/* Okay, COMMIT succeeded, apparently. Check the verifier
 		 * returned by the server against all stored verfs. */
-		if (!memcmp(&req->wb_verf, &data->verf.verifier, sizeof(req->wb_verf))) {
+		if (!nfs_write_verifier_cmp(&req->wb_verf, &data->verf.verifier)) {
 			/* We have a match */
 			nfs_inode_remove_request(req);
 			dprintk(" OK\n");
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 15/28] NFS: Ensure we reset the write verifier 'committed' value on resend.
  2016-07-06 22:29                           ` [PATCH v4 14/28] NFS: Fix O_DIRECT verifier problems Trond Myklebust
@ 2016-07-06 22:29                             ` Trond Myklebust
  2016-07-06 22:29                               ` [PATCH v4 16/28] NFS: Remove racy size manipulations in O_DIRECT Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw)
  To: linux-nfs

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/direct.c   |  2 ++
 fs/nfs/internal.h | 17 +++++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index d6d43b5eafb3..fb659bb50678 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -661,6 +661,8 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq)
 	nfs_direct_write_scan_commit_list(dreq->inode, &reqs, &cinfo);
 
 	dreq->count = 0;
+	dreq->verf.committed = NFS_INVALID_STABLE_HOW;
+	nfs_clear_pnfs_ds_commit_verifiers(&dreq->ds_cinfo);
 	for (i = 0; i < dreq->mirror_count; i++)
 		dreq->mirrors[i].count = 0;
 	get_dreq(dreq);
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 150a8eb0f323..0eb5c924886d 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -499,6 +499,23 @@ int nfs_key_timeout_notify(struct file *filp, struct inode *inode);
 bool nfs_ctx_key_to_expire(struct nfs_open_context *ctx);
 void nfs_pageio_stop_mirroring(struct nfs_pageio_descriptor *pgio);
 
+#ifdef CONFIG_NFS_V4_1
+static inline
+void nfs_clear_pnfs_ds_commit_verifiers(struct pnfs_ds_commit_info *cinfo)
+{
+	int i;
+
+	for (i = 0; i < cinfo->nbuckets; i++)
+		cinfo->buckets[i].direct_verf.committed = NFS_INVALID_STABLE_HOW;
+}
+#else
+static inline
+void nfs_clear_pnfs_ds_commit_verifiers(struct pnfs_ds_commit_info *cinfo)
+{
+}
+#endif
+
+
 #ifdef CONFIG_MIGRATION
 extern int nfs_migrate_page(struct address_space *,
 		struct page *, struct page *, enum migrate_mode);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 16/28] NFS: Remove racy size manipulations in O_DIRECT
  2016-07-06 22:29                             ` [PATCH v4 15/28] NFS: Ensure we reset the write verifier 'committed' value on resend Trond Myklebust
@ 2016-07-06 22:29                               ` Trond Myklebust
  2016-07-06 22:29                                 ` [PATCH v4 17/28] NFS Cleanup: move call to generic_write_checks() into fs/nfs/direct.c Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw)
  To: linux-nfs

On success, the RPC callbacks will ensure that we make the appropriate calls
to nfs_writeback_update_inode()

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/direct.c | 16 ----------------
 1 file changed, 16 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index fb659bb50678..826d4dace0e5 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -376,15 +376,6 @@ static void nfs_direct_complete(struct nfs_direct_req *dreq, bool write)
 {
 	struct inode *inode = dreq->inode;
 
-	if (dreq->iocb && write) {
-		loff_t pos = dreq->iocb->ki_pos + dreq->count;
-
-		spin_lock(&inode->i_lock);
-		if (i_size_read(inode) < pos)
-			i_size_write(inode, pos);
-		spin_unlock(&inode->i_lock);
-	}
-
 	if (write)
 		nfs_zap_mapping(inode, inode->i_mapping);
 
@@ -1058,14 +1049,7 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter)
 	if (!result) {
 		result = nfs_direct_wait(dreq);
 		if (result > 0) {
-			struct inode *inode = mapping->host;
-
 			iocb->ki_pos = pos + result;
-			spin_lock(&inode->i_lock);
-			if (i_size_read(inode) < iocb->ki_pos)
-				i_size_write(inode, iocb->ki_pos);
-			spin_unlock(&inode->i_lock);
-
 			/* XXX: should check the generic_write_sync retval */
 			generic_write_sync(iocb, result);
 		}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 17/28] NFS Cleanup: move call to generic_write_checks() into fs/nfs/direct.c
  2016-07-06 22:29                               ` [PATCH v4 16/28] NFS: Remove racy size manipulations in O_DIRECT Trond Myklebust
@ 2016-07-06 22:29                                 ` Trond Myklebust
  2016-07-06 22:29                                   ` [PATCH v4 18/28] NFS: Move buffered I/O locking into nfs_file_write() Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw)
  To: linux-nfs

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/direct.c | 12 ++++++++----
 fs/nfs/file.c   |  6 +-----
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 826d4dace0e5..0169eca8eb42 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -988,6 +988,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter)
 {
 	ssize_t result = -EINVAL;
+	size_t count;
 	struct file *file = iocb->ki_filp;
 	struct address_space *mapping = file->f_mapping;
 	struct inode *inode = mapping->host;
@@ -998,8 +999,11 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter)
 	dfprintk(FILE, "NFS: direct write(%pD2, %zd@%Ld)\n",
 		file, iov_iter_count(iter), (long long) iocb->ki_pos);
 
-	nfs_add_stats(mapping->host, NFSIOS_DIRECTWRITTENBYTES,
-		      iov_iter_count(iter));
+	result = generic_write_checks(iocb, iter);
+	if (result <= 0)
+		return result;
+	count = result;
+	nfs_add_stats(mapping->host, NFSIOS_DIRECTWRITTENBYTES, count);
 
 	pos = iocb->ki_pos;
 	end = (pos + iov_iter_count(iter) - 1) >> PAGE_SHIFT;
@@ -1017,7 +1021,7 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter)
 			goto out_unlock;
 	}
 
-	task_io_account_write(iov_iter_count(iter));
+	task_io_account_write(count);
 
 	result = -ENOMEM;
 	dreq = nfs_direct_req_alloc();
@@ -1025,7 +1029,7 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter)
 		goto out_unlock;
 
 	dreq->inode = inode;
-	dreq->bytes_left = dreq->max_count = iov_iter_count(iter);
+	dreq->bytes_left = dreq->max_count = count;
 	dreq->io_start = pos;
 	dreq->ctx = get_nfs_open_context(nfs_file_open_context(iocb->ki_filp));
 	l_ctx = nfs_get_lock_context(dreq->ctx);
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index df4dd8e7e62e..c26847c84d00 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -629,12 +629,8 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
 	if (result)
 		return result;
 
-	if (iocb->ki_flags & IOCB_DIRECT) {
-		result = generic_write_checks(iocb, from);
-		if (result <= 0)
-			return result;
+	if (iocb->ki_flags & IOCB_DIRECT)
 		return nfs_file_direct_write(iocb, from);
-	}
 
 	dprintk("NFS: write(%pD2, %zu@%Ld)\n",
 		file, count, (long long) iocb->ki_pos);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 18/28] NFS: Move buffered I/O locking into nfs_file_write()
  2016-07-06 22:29                                 ` [PATCH v4 17/28] NFS Cleanup: move call to generic_write_checks() into fs/nfs/direct.c Trond Myklebust
@ 2016-07-06 22:29                                   ` Trond Myklebust
  2016-07-06 22:29                                     ` [PATCH v4 19/28] NFS: Do not serialise O_DIRECT reads and writes Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw)
  To: linux-nfs

Preparation for the patch that de-serialises O_DIRECT reads and
writes.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/file.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index c26847c84d00..46cf0afe3c0f 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -623,7 +623,6 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
 	struct inode *inode = file_inode(file);
 	unsigned long written = 0;
 	ssize_t result;
-	size_t count = iov_iter_count(from);
 
 	result = nfs_key_timeout_notify(file, inode);
 	if (result)
@@ -633,9 +632,8 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
 		return nfs_file_direct_write(iocb, from);
 
 	dprintk("NFS: write(%pD2, %zu@%Ld)\n",
-		file, count, (long long) iocb->ki_pos);
+		file, iov_iter_count(from), (long long) iocb->ki_pos);
 
-	result = -EBUSY;
 	if (IS_SWAPFILE(inode))
 		goto out_swapfile;
 	/*
@@ -647,28 +645,33 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
 			goto out;
 	}
 
-	result = count;
-	if (!count)
+	inode_lock(inode);
+	result = generic_write_checks(iocb, from);
+	if (result > 0) {
+		current->backing_dev_info = inode_to_bdi(inode);
+		result = generic_perform_write(file, from, iocb->ki_pos);
+		current->backing_dev_info = NULL;
+	}
+	inode_unlock(inode);
+	if (result <= 0)
 		goto out;
 
-	result = generic_file_write_iter(iocb, from);
-	if (result > 0)
-		written = result;
+	written = generic_write_sync(iocb, result);
+	iocb->ki_pos += written;
 
 	/* Return error values */
-	if (result >= 0 && nfs_need_check_write(file, inode)) {
+	if (nfs_need_check_write(file, inode)) {
 		int err = vfs_fsync(file, 0);
 		if (err < 0)
 			result = err;
 	}
-	if (result > 0)
-		nfs_add_stats(inode, NFSIOS_NORMALWRITTENBYTES, written);
+	nfs_add_stats(inode, NFSIOS_NORMALWRITTENBYTES, written);
 out:
 	return result;
 
 out_swapfile:
 	printk(KERN_INFO "NFS: attempt to write to active swap file!\n");
-	goto out;
+	return -EBUSY;
 }
 EXPORT_SYMBOL_GPL(nfs_file_write);
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 19/28] NFS: Do not serialise O_DIRECT reads and writes
  2016-07-06 22:29                                   ` [PATCH v4 18/28] NFS: Move buffered I/O locking into nfs_file_write() Trond Myklebust
@ 2016-07-06 22:29                                     ` Trond Myklebust
  2016-07-06 22:29                                       ` [PATCH v4 20/28] NFS: Cleanup nfs_direct_complete() Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw)
  To: linux-nfs

Allow dio requests to be scheduled in parallel, but ensuring that they
do not conflict with buffered I/O.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/Makefile        |   2 +-
 fs/nfs/direct.c        |  41 +++-----------
 fs/nfs/file.c          |  12 ++--
 fs/nfs/internal.h      |   8 +++
 fs/nfs/io.c            | 147 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/nfs_fs.h |   1 +
 6 files changed, 174 insertions(+), 37 deletions(-)
 create mode 100644 fs/nfs/io.c

diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index 8664417955a2..6abdda209642 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -6,7 +6,7 @@ obj-$(CONFIG_NFS_FS) += nfs.o
 
 CFLAGS_nfstrace.o += -I$(src)
 nfs-y 			:= client.o dir.o file.o getroot.o inode.o super.o \
-			   direct.o pagelist.o read.o symlink.o unlink.o \
+			   io.o direct.o pagelist.o read.o symlink.o unlink.o \
 			   write.o namespace.o mount_clnt.o nfstrace.o
 nfs-$(CONFIG_ROOT_NFS)	+= nfsroot.o
 nfs-$(CONFIG_SYSCTL)	+= sysctl.o
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 0169eca8eb42..6d0e88096440 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -578,17 +578,12 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter)
 	if (!count)
 		goto out;
 
-	inode_lock(inode);
-	result = nfs_sync_mapping(mapping);
-	if (result)
-		goto out_unlock;
-
 	task_io_account_read(count);
 
 	result = -ENOMEM;
 	dreq = nfs_direct_req_alloc();
 	if (dreq == NULL)
-		goto out_unlock;
+		goto out;
 
 	dreq->inode = inode;
 	dreq->bytes_left = dreq->max_count = count;
@@ -603,10 +598,12 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter)
 	if (!is_sync_kiocb(iocb))
 		dreq->iocb = iocb;
 
+	nfs_start_io_direct(inode);
+
 	NFS_I(inode)->read_io += count;
 	result = nfs_direct_read_schedule_iovec(dreq, iter, iocb->ki_pos);
 
-	inode_unlock(inode);
+	nfs_end_io_direct(inode);
 
 	if (!result) {
 		result = nfs_direct_wait(dreq);
@@ -614,13 +611,8 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter)
 			iocb->ki_pos += result;
 	}
 
-	nfs_direct_req_release(dreq);
-	return result;
-
 out_release:
 	nfs_direct_req_release(dreq);
-out_unlock:
-	inode_unlock(inode);
 out:
 	return result;
 }
@@ -1008,25 +1000,12 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter)
 	pos = iocb->ki_pos;
 	end = (pos + iov_iter_count(iter) - 1) >> PAGE_SHIFT;
 
-	inode_lock(inode);
-
-	result = nfs_sync_mapping(mapping);
-	if (result)
-		goto out_unlock;
-
-	if (mapping->nrpages) {
-		result = invalidate_inode_pages2_range(mapping,
-					pos >> PAGE_SHIFT, end);
-		if (result)
-			goto out_unlock;
-	}
-
 	task_io_account_write(count);
 
 	result = -ENOMEM;
 	dreq = nfs_direct_req_alloc();
 	if (!dreq)
-		goto out_unlock;
+		goto out;
 
 	dreq->inode = inode;
 	dreq->bytes_left = dreq->max_count = count;
@@ -1041,6 +1020,8 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter)
 	if (!is_sync_kiocb(iocb))
 		dreq->iocb = iocb;
 
+	nfs_start_io_direct(inode);
+
 	result = nfs_direct_write_schedule_iovec(dreq, iter, pos);
 
 	if (mapping->nrpages) {
@@ -1048,7 +1029,7 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter)
 					      pos >> PAGE_SHIFT, end);
 	}
 
-	inode_unlock(inode);
+	nfs_end_io_direct(inode);
 
 	if (!result) {
 		result = nfs_direct_wait(dreq);
@@ -1058,13 +1039,9 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter)
 			generic_write_sync(iocb, result);
 		}
 	}
-	nfs_direct_req_release(dreq);
-	return result;
-
 out_release:
 	nfs_direct_req_release(dreq);
-out_unlock:
-	inode_unlock(inode);
+out:
 	return result;
 }
 
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 46cf0afe3c0f..9f8da9e1b23f 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -170,12 +170,14 @@ nfs_file_read(struct kiocb *iocb, struct iov_iter *to)
 		iocb->ki_filp,
 		iov_iter_count(to), (unsigned long) iocb->ki_pos);
 
-	result = nfs_revalidate_mapping_protected(inode, iocb->ki_filp->f_mapping);
+	nfs_start_io_read(inode);
+	result = nfs_revalidate_mapping(inode, iocb->ki_filp->f_mapping);
 	if (!result) {
 		result = generic_file_read_iter(iocb, to);
 		if (result > 0)
 			nfs_add_stats(inode, NFSIOS_NORMALREADBYTES, result);
 	}
+	nfs_end_io_read(inode);
 	return result;
 }
 EXPORT_SYMBOL_GPL(nfs_file_read);
@@ -191,12 +193,14 @@ nfs_file_splice_read(struct file *filp, loff_t *ppos,
 	dprintk("NFS: splice_read(%pD2, %lu@%Lu)\n",
 		filp, (unsigned long) count, (unsigned long long) *ppos);
 
-	res = nfs_revalidate_mapping_protected(inode, filp->f_mapping);
+	nfs_start_io_read(inode);
+	res = nfs_revalidate_mapping(inode, filp->f_mapping);
 	if (!res) {
 		res = generic_file_splice_read(filp, ppos, pipe, count, flags);
 		if (res > 0)
 			nfs_add_stats(inode, NFSIOS_NORMALREADBYTES, res);
 	}
+	nfs_end_io_read(inode);
 	return res;
 }
 EXPORT_SYMBOL_GPL(nfs_file_splice_read);
@@ -645,14 +649,14 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
 			goto out;
 	}
 
-	inode_lock(inode);
+	nfs_start_io_write(inode);
 	result = generic_write_checks(iocb, from);
 	if (result > 0) {
 		current->backing_dev_info = inode_to_bdi(inode);
 		result = generic_perform_write(file, from, iocb->ki_pos);
 		current->backing_dev_info = NULL;
 	}
-	inode_unlock(inode);
+	nfs_end_io_write(inode);
 	if (result <= 0)
 		goto out;
 
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 0eb5c924886d..159b64ede82a 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -411,6 +411,14 @@ extern void __exit unregister_nfs_fs(void);
 extern bool nfs_sb_active(struct super_block *sb);
 extern void nfs_sb_deactive(struct super_block *sb);
 
+/* io.c */
+extern void nfs_start_io_read(struct inode *inode);
+extern void nfs_end_io_read(struct inode *inode);
+extern void nfs_start_io_write(struct inode *inode);
+extern void nfs_end_io_write(struct inode *inode);
+extern void nfs_start_io_direct(struct inode *inode);
+extern void nfs_end_io_direct(struct inode *inode);
+
 /* namespace.c */
 #define NFS_PATH_CANONICAL 1
 extern char *nfs_path(char **p, struct dentry *dentry,
diff --git a/fs/nfs/io.c b/fs/nfs/io.c
new file mode 100644
index 000000000000..1fc5d1ce327e
--- /dev/null
+++ b/fs/nfs/io.c
@@ -0,0 +1,147 @@
+/*
+ * Copyright (c) 2016 Trond Myklebust
+ *
+ * I/O and data path helper functionality.
+ */
+
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/bitops.h>
+#include <linux/rwsem.h>
+#include <linux/fs.h>
+#include <linux/nfs_fs.h>
+
+#include "internal.h"
+
+/* Call with exclusively locked inode->i_rwsem */
+static void nfs_block_o_direct(struct nfs_inode *nfsi, struct inode *inode)
+{
+	if (test_bit(NFS_INO_ODIRECT, &nfsi->flags)) {
+		clear_bit(NFS_INO_ODIRECT, &nfsi->flags);
+		inode_dio_wait(inode);
+	}
+}
+
+/**
+ * nfs_start_io_read - declare the file is being used for buffered reads
+ * @inode - file inode
+ *
+ * Declare that a buffered read operation is about to start, and ensure
+ * that we block all direct I/O.
+ * On exit, the function ensures that the NFS_INO_ODIRECT flag is unset,
+ * and holds a shared lock on inode->i_rwsem to ensure that the flag
+ * cannot be changed.
+ * In practice, this means that buffered read operations are allowed to
+ * execute in parallel, thanks to the shared lock, whereas direct I/O
+ * operations need to wait to grab an exclusive lock in order to set
+ * NFS_INO_ODIRECT.
+ * Note that buffered writes and truncates both take a write lock on
+ * inode->i_rwsem, meaning that those are serialised w.r.t. the reads.
+ */
+void
+nfs_start_io_read(struct inode *inode)
+{
+	struct nfs_inode *nfsi = NFS_I(inode);
+	/* Be an optimist! */
+	down_read(&inode->i_rwsem);
+	if (test_bit(NFS_INO_ODIRECT, &nfsi->flags) == 0)
+		return;
+	up_read(&inode->i_rwsem);
+	/* Slow path.... */
+	down_write(&inode->i_rwsem);
+	nfs_block_o_direct(nfsi, inode);
+	downgrade_write(&inode->i_rwsem);
+}
+
+/**
+ * nfs_end_io_read - declare that the buffered read operation is done
+ * @inode - file inode
+ *
+ * Declare that a buffered read operation is done, and release the shared
+ * lock on inode->i_rwsem.
+ */
+void
+nfs_end_io_read(struct inode *inode)
+{
+	up_read(&inode->i_rwsem);
+}
+
+/**
+ * nfs_start_io_write - declare the file is being used for buffered writes
+ * @inode - file inode
+ *
+ * Declare that a buffered read operation is about to start, and ensure
+ * that we block all direct I/O.
+ */
+void
+nfs_start_io_write(struct inode *inode)
+{
+	down_write(&inode->i_rwsem);
+	nfs_block_o_direct(NFS_I(inode), inode);
+}
+
+/**
+ * nfs_end_io_write - declare that the buffered write operation is done
+ * @inode - file inode
+ *
+ * Declare that a buffered write operation is done, and release the
+ * lock on inode->i_rwsem.
+ */
+void
+nfs_end_io_write(struct inode *inode)
+{
+	up_write(&inode->i_rwsem);
+}
+
+/* Call with exclusively locked inode->i_rwsem */
+static void nfs_block_buffered(struct nfs_inode *nfsi, struct inode *inode)
+{
+	if (!test_bit(NFS_INO_ODIRECT, &nfsi->flags)) {
+		set_bit(NFS_INO_ODIRECT, &nfsi->flags);
+		nfs_wb_all(inode);
+	}
+}
+
+/**
+ * nfs_end_io_direct - declare the file is being used for direct i/o
+ * @inode - file inode
+ *
+ * Declare that a direct I/O operation is about to start, and ensure
+ * that we block all buffered I/O.
+ * On exit, the function ensures that the NFS_INO_ODIRECT flag is set,
+ * and holds a shared lock on inode->i_rwsem to ensure that the flag
+ * cannot be changed.
+ * In practice, this means that direct I/O operations are allowed to
+ * execute in parallel, thanks to the shared lock, whereas buffered I/O
+ * operations need to wait to grab an exclusive lock in order to clear
+ * NFS_INO_ODIRECT.
+ * Note that buffered writes and truncates both take a write lock on
+ * inode->i_rwsem, meaning that those are serialised w.r.t. O_DIRECT.
+ */
+void
+nfs_start_io_direct(struct inode *inode)
+{
+	struct nfs_inode *nfsi = NFS_I(inode);
+	/* Be an optimist! */
+	down_read(&inode->i_rwsem);
+	if (test_bit(NFS_INO_ODIRECT, &nfsi->flags) != 0)
+		return;
+	up_read(&inode->i_rwsem);
+	/* Slow path.... */
+	down_write(&inode->i_rwsem);
+	nfs_block_buffered(nfsi, inode);
+	downgrade_write(&inode->i_rwsem);
+}
+
+/**
+ * nfs_end_io_direct - declare that the direct i/o operation is done
+ * @inode - file inode
+ *
+ * Declare that a direct I/O operation is done, and release the shared
+ * lock on inode->i_rwsem.
+ */
+void
+nfs_end_io_direct(struct inode *inode)
+{
+	up_read(&inode->i_rwsem);
+}
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 120dd04b553c..225d17d35277 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -210,6 +210,7 @@ struct nfs_inode {
 #define NFS_INO_LAYOUTCOMMIT	(9)		/* layoutcommit required */
 #define NFS_INO_LAYOUTCOMMITTING (10)		/* layoutcommit inflight */
 #define NFS_INO_LAYOUTSTATS	(11)		/* layoutstats inflight */
+#define NFS_INO_ODIRECT		(12)		/* I/O setting is O_DIRECT */
 
 static inline struct nfs_inode *NFS_I(const struct inode *inode)
 {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 20/28] NFS: Cleanup nfs_direct_complete()
  2016-07-06 22:29                                     ` [PATCH v4 19/28] NFS: Do not serialise O_DIRECT reads and writes Trond Myklebust
@ 2016-07-06 22:29                                       ` Trond Myklebust
  2016-07-06 22:29                                         ` [PATCH v4 21/28] NFS: Remove redundant waits for O_DIRECT in fsync() and write_begin() Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw)
  To: linux-nfs

There is only one caller that sets the "write" argument to true,
so just move the call to nfs_zap_mapping() and get rid of the
now redundant argument.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/direct.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 6d0e88096440..c16d33eb1ddf 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -372,13 +372,10 @@ out:
  * Synchronous I/O uses a stack-allocated iocb.  Thus we can't trust
  * the iocb is still valid here if this is a synchronous request.
  */
-static void nfs_direct_complete(struct nfs_direct_req *dreq, bool write)
+static void nfs_direct_complete(struct nfs_direct_req *dreq)
 {
 	struct inode *inode = dreq->inode;
 
-	if (write)
-		nfs_zap_mapping(inode, inode->i_mapping);
-
 	inode_dio_end(inode);
 
 	if (dreq->iocb) {
@@ -431,7 +428,7 @@ static void nfs_direct_read_completion(struct nfs_pgio_header *hdr)
 	}
 out_put:
 	if (put_dreq(dreq))
-		nfs_direct_complete(dreq, false);
+		nfs_direct_complete(dreq);
 	hdr->release(hdr);
 }
 
@@ -537,7 +534,7 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
 	}
 
 	if (put_dreq(dreq))
-		nfs_direct_complete(dreq, false);
+		nfs_direct_complete(dreq);
 	return 0;
 }
 
@@ -764,7 +761,8 @@ static void nfs_direct_write_schedule_work(struct work_struct *work)
 			nfs_direct_write_reschedule(dreq);
 			break;
 		default:
-			nfs_direct_complete(dreq, true);
+			nfs_zap_mapping(dreq->inode, dreq->inode->i_mapping);
+			nfs_direct_complete(dreq);
 	}
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 21/28] NFS: Remove redundant waits for O_DIRECT in fsync() and write_begin()
  2016-07-06 22:29                                       ` [PATCH v4 20/28] NFS: Cleanup nfs_direct_complete() Trond Myklebust
@ 2016-07-06 22:29                                         ` Trond Myklebust
  2016-07-06 22:29                                           ` [PATCH v4 22/28] NFS: Remove unused function nfs_revalidate_mapping_protected() Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw)
  To: linux-nfs

We're now waiting immediately after taking the locks, so waiting
in fsync() and write_begin() is either redundant or potentially
subject to livelock (if not holding the lock).

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/file.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 9f8da9e1b23f..0e9b4a068f13 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -276,7 +276,6 @@ nfs_file_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 
 	trace_nfs_fsync_enter(inode);
 
-	inode_dio_wait(inode);
 	do {
 		ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
 		if (ret != 0)
@@ -361,11 +360,6 @@ static int nfs_write_begin(struct file *file, struct address_space *mapping,
 		file, mapping->host->i_ino, len, (long long) pos);
 
 start:
-	/*
-	 * Wait for O_DIRECT to complete
-	 */
-	inode_dio_wait(mapping->host);
-
 	page = grab_cache_page_write_begin(mapping, index, flags);
 	if (!page)
 		return -ENOMEM;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 22/28] NFS: Remove unused function nfs_revalidate_mapping_protected()
  2016-07-06 22:29                                         ` [PATCH v4 21/28] NFS: Remove redundant waits for O_DIRECT in fsync() and write_begin() Trond Myklebust
@ 2016-07-06 22:29                                           ` Trond Myklebust
  2016-07-06 22:30                                             ` [PATCH v4 23/28] NFS: Do not aggressively cache file attributes in the case of O_DIRECT Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw)
  To: linux-nfs

Clean up...

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/inode.c         | 38 ++++----------------------------------
 include/linux/nfs_fs.h |  1 -
 2 files changed, 4 insertions(+), 35 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 6c0618eb5d57..0e0500f2bb6b 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1131,14 +1131,12 @@ out:
 }
 
 /**
- * __nfs_revalidate_mapping - Revalidate the pagecache
+ * nfs_revalidate_mapping - Revalidate the pagecache
  * @inode - pointer to host inode
  * @mapping - pointer to mapping
- * @may_lock - take inode->i_mutex?
  */
-static int __nfs_revalidate_mapping(struct inode *inode,
-		struct address_space *mapping,
-		bool may_lock)
+int nfs_revalidate_mapping(struct inode *inode,
+		struct address_space *mapping)
 {
 	struct nfs_inode *nfsi = NFS_I(inode);
 	unsigned long *bitlock = &nfsi->flags;
@@ -1187,12 +1185,7 @@ static int __nfs_revalidate_mapping(struct inode *inode,
 	nfsi->cache_validity &= ~NFS_INO_INVALID_DATA;
 	spin_unlock(&inode->i_lock);
 	trace_nfs_invalidate_mapping_enter(inode);
-	if (may_lock) {
-		inode_lock(inode);
-		ret = nfs_invalidate_mapping(inode, mapping);
-		inode_unlock(inode);
-	} else
-		ret = nfs_invalidate_mapping(inode, mapping);
+	ret = nfs_invalidate_mapping(inode, mapping);
 	trace_nfs_invalidate_mapping_exit(inode, ret);
 
 	clear_bit_unlock(NFS_INO_INVALIDATING, bitlock);
@@ -1202,29 +1195,6 @@ out:
 	return ret;
 }
 
-/**
- * nfs_revalidate_mapping - Revalidate the pagecache
- * @inode - pointer to host inode
- * @mapping - pointer to mapping
- */
-int nfs_revalidate_mapping(struct inode *inode, struct address_space *mapping)
-{
-	return __nfs_revalidate_mapping(inode, mapping, false);
-}
-
-/**
- * nfs_revalidate_mapping_protected - Revalidate the pagecache
- * @inode - pointer to host inode
- * @mapping - pointer to mapping
- *
- * Differs from nfs_revalidate_mapping() in that it grabs the inode->i_mutex
- * while invalidating the mapping.
- */
-int nfs_revalidate_mapping_protected(struct inode *inode, struct address_space *mapping)
-{
-	return __nfs_revalidate_mapping(inode, mapping, true);
-}
-
 static bool nfs_file_has_writers(struct nfs_inode *nfsi)
 {
 	struct inode *inode = &nfsi->vfs_inode;
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 225d17d35277..810124b33327 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -351,7 +351,6 @@ extern int nfs_revalidate_inode_rcu(struct nfs_server *server, struct inode *ino
 extern int __nfs_revalidate_inode(struct nfs_server *, struct inode *);
 extern int nfs_revalidate_mapping(struct inode *inode, struct address_space *mapping);
 extern int nfs_revalidate_mapping_rcu(struct inode *inode);
-extern int nfs_revalidate_mapping_protected(struct inode *inode, struct address_space *mapping);
 extern int nfs_setattr(struct dentry *, struct iattr *);
 extern void nfs_setattr_update_inode(struct inode *inode, struct iattr *attr, struct nfs_fattr *);
 extern void nfs_setsecurity(struct inode *inode, struct nfs_fattr *fattr,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 23/28] NFS: Do not aggressively cache file attributes in the case of O_DIRECT
  2016-07-06 22:29                                           ` [PATCH v4 22/28] NFS: Remove unused function nfs_revalidate_mapping_protected() Trond Myklebust
@ 2016-07-06 22:30                                             ` Trond Myklebust
  2016-07-06 22:30                                               ` [PATCH v4 24/28] NFS: Getattr doesn't require data sync semantics Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-06 22:30 UTC (permalink / raw)
  To: linux-nfs

A file that is open for O_DIRECT is by definition not obeying
close-to-open cache consistency semantics, so let's not cache
the attributes too aggressively either.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/inode.c    | 9 +++++++--
 fs/nfs/internal.h | 5 +++++
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 0e0500f2bb6b..7688436b19ba 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1214,6 +1214,11 @@ static bool nfs_file_has_writers(struct nfs_inode *nfsi)
 			list)->mode & FMODE_WRITE) == FMODE_WRITE;
 }
 
+static bool nfs_file_has_buffered_writers(struct nfs_inode *nfsi)
+{
+	return nfs_file_has_writers(nfsi) && nfs_file_io_is_buffered(nfsi);
+}
+
 static unsigned long nfs_wcc_update_inode(struct inode *inode, struct nfs_fattr *fattr)
 {
 	struct nfs_inode *nfsi = NFS_I(inode);
@@ -1278,7 +1283,7 @@ static int nfs_check_inode_attributes(struct inode *inode, struct nfs_fattr *fat
 	if ((fattr->valid & NFS_ATTR_FATTR_TYPE) && (inode->i_mode & S_IFMT) != (fattr->mode & S_IFMT))
 		return -EIO;
 
-	if (!nfs_file_has_writers(nfsi)) {
+	if (!nfs_file_has_buffered_writers(nfsi)) {
 		/* Verify a few of the more important attributes */
 		if ((fattr->valid & NFS_ATTR_FATTR_CHANGE) != 0 && inode->i_version != fattr->change_attr)
 			invalid |= NFS_INO_INVALID_ATTR | NFS_INO_REVAL_PAGECACHE;
@@ -1660,7 +1665,7 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
 	unsigned long invalid = 0;
 	unsigned long now = jiffies;
 	unsigned long save_cache_validity;
-	bool have_writers = nfs_file_has_writers(nfsi);
+	bool have_writers = nfs_file_has_buffered_writers(nfsi);
 	bool cache_revalidated = true;
 
 	dfprintk(VFS, "NFS: %s(%s/%lu fh_crc=0x%08x ct=%d info=0x%x)\n",
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 159b64ede82a..01dccf18da0a 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -419,6 +419,11 @@ extern void nfs_end_io_write(struct inode *inode);
 extern void nfs_start_io_direct(struct inode *inode);
 extern void nfs_end_io_direct(struct inode *inode);
 
+static inline bool nfs_file_io_is_buffered(struct nfs_inode *nfsi)
+{
+	return test_bit(NFS_INO_ODIRECT, &nfsi->flags) == 0;
+}
+
 /* namespace.c */
 #define NFS_PATH_CANONICAL 1
 extern char *nfs_path(char **p, struct dentry *dentry,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 24/28] NFS: Getattr doesn't require data sync semantics
  2016-07-06 22:30                                             ` [PATCH v4 23/28] NFS: Do not aggressively cache file attributes in the case of O_DIRECT Trond Myklebust
@ 2016-07-06 22:30                                               ` Trond Myklebust
  2016-07-06 22:30                                                 ` [PATCH v4 25/28] NFSv4.2: Fix a race in nfs42_proc_deallocate() Trond Myklebust
  2016-07-18  3:48                                                 ` [PATCH v4 24/28] NFS: Getattr doesn't require data sync semantics Christoph Hellwig
  0 siblings, 2 replies; 69+ messages in thread
From: Trond Myklebust @ 2016-07-06 22:30 UTC (permalink / raw)
  To: linux-nfs

When retrieving stat() information, NFS unfortunately does require us to
sync writes to disk in order to ensure that mtime and ctime are up to
date. However we shouldn't have to ensure that those writes are persisted.

Relaxing that requirement does mean that we may see an mtime/ctime change
if the server reboots and forces us to replay all writes.

The exception to this rule are pNFS clients that are required to send
layoutcommit, however that is dealt with by the call to pnfs_sync_inode()
in _nfs_revalidate_inode().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/inode.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 7688436b19ba..35fda08dc4f6 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -661,9 +661,7 @@ int nfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
 	trace_nfs_getattr_enter(inode);
 	/* Flush out writes to the server in order to update c/mtime.  */
 	if (S_ISREG(inode->i_mode)) {
-		inode_lock(inode);
-		err = nfs_sync_inode(inode);
-		inode_unlock(inode);
+		err = filemap_write_and_wait(inode->i_mapping);
 		if (err)
 			goto out;
 	}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 25/28] NFSv4.2: Fix a race in nfs42_proc_deallocate()
  2016-07-06 22:30                                               ` [PATCH v4 24/28] NFS: Getattr doesn't require data sync semantics Trond Myklebust
@ 2016-07-06 22:30                                                 ` Trond Myklebust
  2016-07-06 22:30                                                   ` [PATCH v4 26/28] NFSv4.2: Fix writeback races in nfs4_copy_file_range Trond Myklebust
  2016-07-18  3:48                                                 ` [PATCH v4 24/28] NFS: Getattr doesn't require data sync semantics Christoph Hellwig
  1 sibling, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-06 22:30 UTC (permalink / raw)
  To: linux-nfs

When punching holes in a file, we want to ensure the operation is
serialised w.r.t. other writes, meaning that we want to call
nfs_sync_inode() while holding the inode lock.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/nfs42proc.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index aa03ed09ba06..0f9f536e647b 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -113,15 +113,17 @@ int nfs42_proc_deallocate(struct file *filep, loff_t offset, loff_t len)
 	if (!nfs_server_capable(inode, NFS_CAP_DEALLOCATE))
 		return -EOPNOTSUPP;
 
-	nfs_wb_all(inode);
 	inode_lock(inode);
+	err = nfs_sync_inode(inode);
+	if (err)
+		goto out_unlock;
 
 	err = nfs42_proc_fallocate(&msg, filep, offset, len);
 	if (err == 0)
 		truncate_pagecache_range(inode, offset, (offset + len) -1);
 	if (err == -EOPNOTSUPP)
 		NFS_SERVER(inode)->caps &= ~NFS_CAP_DEALLOCATE;
-
+out_unlock:
 	inode_unlock(inode);
 	return err;
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 26/28] NFSv4.2: Fix writeback races in nfs4_copy_file_range
  2016-07-06 22:30                                                 ` [PATCH v4 25/28] NFSv4.2: Fix a race in nfs42_proc_deallocate() Trond Myklebust
@ 2016-07-06 22:30                                                   ` Trond Myklebust
  2016-07-06 22:30                                                     ` [PATCH v4 27/28] NFSv4.2: llseek(SEEK_HOLE) and llseek(SEEK_DATA) don't require data sync Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-06 22:30 UTC (permalink / raw)
  To: linux-nfs

We need to ensure that any writes to the destination file are serialised
with the copy, meaning that the writeback has to occur under the inode lock.

Also relax the writeback requirement on the source, and rely on the
stateid checking to tell us if the source rebooted. Add the helper
nfs_filemap_write_and_wait_range() to call pnfs_sync_inode() as
is appropriate for pNFS servers that may need a layoutcommit.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/internal.h  |  3 +++
 fs/nfs/nfs42proc.c |  9 +++++++++
 fs/nfs/nfs4file.c  | 14 +-------------
 fs/nfs/write.c     | 18 ++++++++++++++++++
 4 files changed, 31 insertions(+), 13 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 01dccf18da0a..3b01c9146e15 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -512,6 +512,9 @@ int nfs_key_timeout_notify(struct file *filp, struct inode *inode);
 bool nfs_ctx_key_to_expire(struct nfs_open_context *ctx);
 void nfs_pageio_stop_mirroring(struct nfs_pageio_descriptor *pgio);
 
+int nfs_filemap_write_and_wait_range(struct address_space *mapping,
+		loff_t lstart, loff_t lend);
+
 #ifdef CONFIG_NFS_V4_1
 static inline
 void nfs_clear_pnfs_ds_commit_verifiers(struct pnfs_ds_commit_info *cinfo)
diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index 0f9f536e647b..b7d457cea03f 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -156,11 +156,20 @@ static ssize_t _nfs42_proc_copy(struct file *src, loff_t pos_src,
 	if (status)
 		return status;
 
+	status = nfs_filemap_write_and_wait_range(file_inode(src)->i_mapping,
+			pos_src, pos_src + (loff_t)count - 1);
+	if (status)
+		return status;
+
 	status = nfs4_set_rw_stateid(&args.dst_stateid, dst_lock->open_context,
 				     dst_lock, FMODE_WRITE);
 	if (status)
 		return status;
 
+	status = nfs_sync_inode(dst_inode);
+	if (status)
+		return status;
+
 	status = nfs4_call_sync(server->client, server, &msg,
 				&args.seq_args, &res.seq_res, 0);
 	if (status == -ENOTSUPP)
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index 014b0e41ace5..7cdc0ab9e6f5 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -133,21 +133,9 @@ static ssize_t nfs4_copy_file_range(struct file *file_in, loff_t pos_in,
 				    struct file *file_out, loff_t pos_out,
 				    size_t count, unsigned int flags)
 {
-	struct inode *in_inode = file_inode(file_in);
-	struct inode *out_inode = file_inode(file_out);
-	int ret;
-
-	if (in_inode == out_inode)
+	if (file_inode(file_in) == file_inode(file_out))
 		return -EINVAL;
 
-	/* flush any pending writes */
-	ret = nfs_sync_inode(in_inode);
-	if (ret)
-		return ret;
-	ret = nfs_sync_inode(out_inode);
-	if (ret)
-		return ret;
-
 	return nfs42_proc_copy(file_in, pos_in, file_out, pos_out, count);
 }
 
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 3087fb6f1983..538a473b324b 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1913,6 +1913,24 @@ out_mark_dirty:
 EXPORT_SYMBOL_GPL(nfs_write_inode);
 
 /*
+ * Wrapper for filemap_write_and_wait_range()
+ *
+ * Needed for pNFS in order to ensure data becomes visible to the
+ * client.
+ */
+int nfs_filemap_write_and_wait_range(struct address_space *mapping,
+		loff_t lstart, loff_t lend)
+{
+	int ret;
+
+	ret = filemap_write_and_wait_range(mapping, lstart, lend);
+	if (ret == 0)
+		ret = pnfs_sync_inode(mapping->host, true);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(nfs_filemap_write_and_wait_range);
+
+/*
  * flush the inode to disk.
  */
 int nfs_wb_all(struct inode *inode)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 27/28] NFSv4.2: llseek(SEEK_HOLE) and llseek(SEEK_DATA) don't require data sync
  2016-07-06 22:30                                                   ` [PATCH v4 26/28] NFSv4.2: Fix writeback races in nfs4_copy_file_range Trond Myklebust
@ 2016-07-06 22:30                                                     ` Trond Myklebust
  2016-07-06 22:30                                                       ` [PATCH v4 28/28] NFS nfs_vm_page_mkwrite: Don't freeze me, Bro Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-06 22:30 UTC (permalink / raw)
  To: linux-nfs

We want to ensure that we write the cached data to the server, but
don't require it be synced to disk. If the server reboots, we will
get a stateid error, which will cause us to retry anyway.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/nfs42proc.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index b7d457cea03f..616dc254b38b 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -269,7 +269,11 @@ static loff_t _nfs42_proc_llseek(struct file *filep,
 	if (status)
 		return status;
 
-	nfs_wb_all(inode);
+	status = nfs_filemap_write_and_wait_range(inode->i_mapping,
+			offset, LLONG_MAX);
+	if (status)
+		return status;
+
 	status = nfs4_call_sync(server->client, server, &msg,
 				&args.seq_args, &res.seq_res, 0);
 	if (status == -ENOTSUPP)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v4 28/28] NFS nfs_vm_page_mkwrite: Don't freeze me, Bro...
  2016-07-06 22:30                                                     ` [PATCH v4 27/28] NFSv4.2: llseek(SEEK_HOLE) and llseek(SEEK_DATA) don't require data sync Trond Myklebust
@ 2016-07-06 22:30                                                       ` Trond Myklebust
  0 siblings, 0 replies; 69+ messages in thread
From: Trond Myklebust @ 2016-07-06 22:30 UTC (permalink / raw)
  To: linux-nfs

Prevent filesystem freezes while handling the write page fault.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/file.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 0e9b4a068f13..039d58790629 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -569,6 +569,8 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 		filp, filp->f_mapping->host->i_ino,
 		(long long)page_offset(page));
 
+	sb_start_pagefault(inode->i_sb);
+
 	/* make sure the cache has finished storing the page */
 	nfs_fscache_wait_on_page_write(NFS_I(inode), page);
 
@@ -595,6 +597,7 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 out_unlock:
 	unlock_page(page);
 out:
+	sb_end_pagefault(inode->i_sb);
 	return ret;
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] NFS: Getattr doesn't require data sync semantics
  2016-07-06 22:30                                               ` [PATCH v4 24/28] NFS: Getattr doesn't require data sync semantics Trond Myklebust
  2016-07-06 22:30                                                 ` [PATCH v4 25/28] NFSv4.2: Fix a race in nfs42_proc_deallocate() Trond Myklebust
@ 2016-07-18  3:48                                                 ` Christoph Hellwig
  2016-07-18  4:32                                                   ` Trond Myklebust
  1 sibling, 1 reply; 69+ messages in thread
From: Christoph Hellwig @ 2016-07-18  3:48 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Wed, Jul 06, 2016 at 06:30:01PM -0400, Trond Myklebust wrote:
> When retrieving stat() information, NFS unfortunately does require us to
> sync writes to disk in order to ensure that mtime and ctime are up to
> date. However we shouldn't have to ensure that those writes are persisted.
> 
> Relaxing that requirement does mean that we may see an mtime/ctime change
> if the server reboots and forces us to replay all writes.
> 
> The exception to this rule are pNFS clients that are required to send
> layoutcommit, however that is dealt with by the call to pnfs_sync_inode()
> in _nfs_revalidate_inode().

This one breaks xfstests generic/207 on block/scsi layout for me.  The
reason for that is that we need a layoutcommit after writing out all
data for the file for the file size to be updated on the server.

Below is my attempt to fix this by re-adding pnfs_sync_inode to
nfs_getattr.  The call in _nfs_revalidate_inode isn't enough as it
doesn't get called in most cases we care about.

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 22a53ee..8bd04cf 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -660,11 +660,20 @@ int nfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
 	int err = 0;
 
 	trace_nfs_getattr_enter(inode);
-	/* Flush out writes to the server in order to update c/mtime.  */
+
+	/*
+	 * Flush out writes to the server in order to update c/mtime as well
+	 * as the file size.  In the pNFS case this also requires a
+	 * LAYOUTCOMMIT.
+	 */
 	if (S_ISREG(inode->i_mode)) {
 		err = filemap_write_and_wait(inode->i_mapping);
 		if (err)
 			goto out;
+
+		err = pnfs_sync_inode(inode, true);
+		if (err)
+			goto out;
 	}
 
 	/*


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] NFS: Getattr doesn't require data sync semantics
  2016-07-18  3:48                                                 ` [PATCH v4 24/28] NFS: Getattr doesn't require data sync semantics Christoph Hellwig
@ 2016-07-18  4:32                                                   ` Trond Myklebust
  2016-07-18  4:59                                                     ` Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-18  4:32 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: List Linux

Hi Christoph,

> On Jul 17, 2016, at 23:48, Christoph Hellwig <hch@infradead.org> wrote:
> 
> On Wed, Jul 06, 2016 at 06:30:01PM -0400, Trond Myklebust wrote:
>> When retrieving stat() information, NFS unfortunately does require us to
>> sync writes to disk in order to ensure that mtime and ctime are up to
>> date. However we shouldn't have to ensure that those writes are persisted.
>> 
>> Relaxing that requirement does mean that we may see an mtime/ctime change
>> if the server reboots and forces us to replay all writes.
>> 
>> The exception to this rule are pNFS clients that are required to send
>> layoutcommit, however that is dealt with by the call to pnfs_sync_inode()
>> in _nfs_revalidate_inode().
> 
> This one breaks xfstests generic/207 on block/scsi layout for me.  The
> reason for that is that we need a layoutcommit after writing out all
> data for the file for the file size to be updated on the server.
> 
> Below is my attempt to fix this by re-adding pnfs_sync_inode to
> nfs_getattr.  The call in _nfs_revalidate_inode isn't enough as it
> doesn't get called in most cases we care about.
> 

I’m not understanding this argument. Why do we care if the file size is up to date on the server if we’re not sending an actual GETATTR on the wire to retrieve the file size?

Cheers
  Trond


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] NFS: Getattr doesn't require data sync semantics
  2016-07-18  4:32                                                   ` Trond Myklebust
@ 2016-07-18  4:59                                                     ` Trond Myklebust
  2016-07-19  3:58                                                       ` hch
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-18  4:59 UTC (permalink / raw)
  To: hch; +Cc: linux-nfs

T24gTW9uLCAyMDE2LTA3LTE4IGF0IDAwOjMyIC0wNDAwLCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6
DQo+IEhpIENocmlzdG9waCwNCj4gDQo+ID4gDQo+ID4gT24gSnVsIDE3LCAyMDE2LCBhdCAyMzo0
OCwgQ2hyaXN0b3BoIEhlbGx3aWcgPGhjaEBpbmZyYWRlYWQub3JnPg0KPiA+IHdyb3RlOg0KPiA+
IA0KPiA+IE9uIFdlZCwgSnVsIDA2LCAyMDE2IGF0IDA2OjMwOjAxUE0gLTA0MDAsIFRyb25kIE15
a2xlYnVzdCB3cm90ZToNCj4gPiA+IA0KPiA+ID4gV2hlbiByZXRyaWV2aW5nIHN0YXQoKSBpbmZv
cm1hdGlvbiwgTkZTIHVuZm9ydHVuYXRlbHkgZG9lcw0KPiA+ID4gcmVxdWlyZSB1cyB0bw0KPiA+
ID4gc3luYyB3cml0ZXMgdG8gZGlzayBpbiBvcmRlciB0byBlbnN1cmUgdGhhdCBtdGltZSBhbmQg
Y3RpbWUgYXJlDQo+ID4gPiB1cCB0bw0KPiA+ID4gZGF0ZS4gSG93ZXZlciB3ZSBzaG91bGRuJ3Qg
aGF2ZSB0byBlbnN1cmUgdGhhdCB0aG9zZSB3cml0ZXMgYXJlDQo+ID4gPiBwZXJzaXN0ZWQuDQo+
ID4gPiANCj4gPiA+IFJlbGF4aW5nIHRoYXQgcmVxdWlyZW1lbnQgZG9lcyBtZWFuIHRoYXQgd2Ug
bWF5IHNlZSBhbg0KPiA+ID4gbXRpbWUvY3RpbWUgY2hhbmdlDQo+ID4gPiBpZiB0aGUgc2VydmVy
IHJlYm9vdHMgYW5kIGZvcmNlcyB1cyB0byByZXBsYXkgYWxsIHdyaXRlcy4NCj4gPiA+IA0KPiA+
ID4gVGhlIGV4Y2VwdGlvbiB0byB0aGlzIHJ1bGUgYXJlIHBORlMgY2xpZW50cyB0aGF0IGFyZSBy
ZXF1aXJlZCB0bw0KPiA+ID4gc2VuZA0KPiA+ID4gbGF5b3V0Y29tbWl0LCBob3dldmVyIHRoYXQg
aXMgZGVhbHQgd2l0aCBieSB0aGUgY2FsbCB0bw0KPiA+ID4gcG5mc19zeW5jX2lub2RlKCkNCj4g
PiA+IGluIF9uZnNfcmV2YWxpZGF0ZV9pbm9kZSgpLg0KPiA+IA0KPiA+IFRoaXMgb25lIGJyZWFr
cyB4ZnN0ZXN0cyBnZW5lcmljLzIwNyBvbiBibG9jay9zY3NpIGxheW91dCBmb3INCj4gPiBtZS7C
oMKgVGhlDQo+ID4gcmVhc29uIGZvciB0aGF0IGlzIHRoYXQgd2UgbmVlZCBhIGxheW91dGNvbW1p
dCBhZnRlciB3cml0aW5nIG91dA0KPiA+IGFsbA0KPiA+IGRhdGEgZm9yIHRoZSBmaWxlIGZvciB0
aGUgZmlsZSBzaXplIHRvIGJlIHVwZGF0ZWQgb24gdGhlIHNlcnZlci4NCj4gPiANCj4gPiBCZWxv
dyBpcyBteSBhdHRlbXB0IHRvIGZpeCB0aGlzIGJ5IHJlLWFkZGluZyBwbmZzX3N5bmNfaW5vZGUg
dG8NCj4gPiBuZnNfZ2V0YXR0ci7CoMKgVGhlIGNhbGwgaW4gX25mc19yZXZhbGlkYXRlX2lub2Rl
IGlzbid0IGVub3VnaCBhcyBpdA0KPiA+IGRvZXNuJ3QgZ2V0IGNhbGxlZCBpbiBtb3N0IGNhc2Vz
IHdlIGNhcmUgYWJvdXQuDQo+ID4gDQo+IA0KPiBJ4oCZbSBub3QgdW5kZXJzdGFuZGluZyB0aGlz
IGFyZ3VtZW50LiBXaHkgZG8gd2UgY2FyZSBpZiB0aGUgZmlsZSBzaXplDQo+IGlzIHVwIHRvIGRh
dGUgb24gdGhlIHNlcnZlciBpZiB3ZeKAmXJlIG5vdCBzZW5kaW5nIGFuIGFjdHVhbCBHRVRBVFRS
IG9uDQo+IHRoZSB3aXJlIHRvIHJldHJpZXZlIHRoZSBmaWxlIHNpemU/DQo+IA0KPiBDaGVlcnMN
Cj4gwqAgVHJvbmQNCg0KQWN0dWFsbHkuLi4gVGhlIHByb2JsZW0gbWlnaHQgYmUgdGhhdCBhIHBy
ZXZpb3VzIGF0dHJpYnV0ZSB1cGRhdGUgaXMNCm1hcmtpbmcgdGhlIGF0dHJpYnV0ZSBjYWNoZSBh
cyBiZWluZyByZXZhbGlkYXRlZC4gRG9lcyB0aGUgZm9sbG93aW5nDQpwYXRjaCBoZWxwPw0KDQpD
aGVlcnMNCsKgIFRyb25kDQoNCjg8LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0NCkZyb20gMTBiN2U5YWQ0NDg4MWZjZDQ2YWMyNGViNzM3
NDM3N2M2ZTg5NjJlZCBNb24gU2VwIDE3IDAwOjAwOjAwIDIwMDENCkZyb206IFRyb25kIE15a2xl
YnVzdCA8dHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbT4NCkRhdGU6IE1vbiwgMTggSnVs
IDIwMTYgMDA6NTE6MDEgLTA0MDANClN1YmplY3Q6IFtQQVRDSF0gcE5GUzogRG9uJ3QgbWFyayB0
aGUgaW5vZGUgYXMgcmV2YWxpZGF0ZWQgaWYgYSBMQVlPVVRDT01NSVQNCiBpcyBvdXRzdGFuZGlu
Zw0KDQpXZSBrbm93IHRoYXQgdGhlIGF0dHJpYnV0ZXMgd2lsbCBuZWVkIHVwZGF0aW5nIGlmIHRo
ZXJlIGlzIHN0aWxsIGENCkxBWU9VVENPTU1JVCBvdXRzdGFuZGluZy4NCg0KUmVwb3J0ZWQtYnk6
IENocmlzdG9waCBIZWxsd2lnIDxoY2hAbHN0LmRlPg0KU2lnbmVkLW9mZi1ieTogVHJvbmQgTXlr
bGVidXN0IDx0cm9uZC5teWtsZWJ1c3RAcHJpbWFyeWRhdGEuY29tPg0KLS0tDQogZnMvbmZzL2lu
b2RlLmMgfCA1ICsrKystDQogZnMvbmZzL3BuZnMuaCAgfCA3ICsrKysrKysNCiAyIGZpbGVzIGNo
YW5nZWQsIDExIGluc2VydGlvbnMoKyksIDEgZGVsZXRpb24oLSkNCg0KZGlmZiAtLWdpdCBhL2Zz
L25mcy9pbm9kZS5jIGIvZnMvbmZzL2lub2RlLmMNCmluZGV4IDM1ZmRhMDhkYzRmNi4uOWRmNDU4
MzJlMjhiIDEwMDY0NA0KLS0tIGEvZnMvbmZzL2lub2RlLmMNCisrKyBiL2ZzL25mcy9pbm9kZS5j
DQpAQCAtMTY2NCw3ICsxNjY0LDcgQEAgc3RhdGljIGludCBuZnNfdXBkYXRlX2lub2RlKHN0cnVj
dCBpbm9kZSAqaW5vZGUsIHN0cnVjdCBuZnNfZmF0dHIgKmZhdHRyKQ0KIAl1bnNpZ25lZCBsb25n
IG5vdyA9IGppZmZpZXM7DQogCXVuc2lnbmVkIGxvbmcgc2F2ZV9jYWNoZV92YWxpZGl0eTsNCiAJ
Ym9vbCBoYXZlX3dyaXRlcnMgPSBuZnNfZmlsZV9oYXNfYnVmZmVyZWRfd3JpdGVycyhuZnNpKTsN
Ci0JYm9vbCBjYWNoZV9yZXZhbGlkYXRlZCA9IHRydWU7DQorCWJvb2wgY2FjaGVfcmV2YWxpZGF0
ZWQ7DQogDQogCWRmcHJpbnRrKFZGUywgIk5GUzogJXMoJXMvJWx1IGZoX2NyYz0weCUwOHggY3Q9
JWQgaW5mbz0weCV4KVxuIiwNCiAJCQlfX2Z1bmNfXywgaW5vZGUtPmlfc2ItPnNfaWQsIGlub2Rl
LT5pX2lubywNCkBAIC0xNzEzLDYgKzE3MTMsOSBAQCBzdGF0aWMgaW50IG5mc191cGRhdGVfaW5v
ZGUoc3RydWN0IGlub2RlICppbm9kZSwgc3RydWN0IG5mc19mYXR0ciAqZmF0dHIpDQogCS8qIERv
IGF0b21pYyB3ZWFrIGNhY2hlIGNvbnNpc3RlbmN5IHVwZGF0ZXMgKi8NCiAJaW52YWxpZCB8PSBu
ZnNfd2NjX3VwZGF0ZV9pbm9kZShpbm9kZSwgZmF0dHIpOw0KIA0KKw0KKwljYWNoZV9yZXZhbGlk
YXRlZCA9ICFwbmZzX2xheW91dGNvbW1pdF9vdXRzdGFuZGluZyhpbm9kZSk7DQorDQogCS8qIE1v
cmUgY2FjaGUgY29uc2lzdGVuY3kgY2hlY2tzICovDQogCWlmIChmYXR0ci0+dmFsaWQgJiBORlNf
QVRUUl9GQVRUUl9DSEFOR0UpIHsNCiAJCWlmIChpbm9kZS0+aV92ZXJzaW9uICE9IGZhdHRyLT5j
aGFuZ2VfYXR0cikgew0KZGlmZiAtLWdpdCBhL2ZzL25mcy9wbmZzLmggYi9mcy9uZnMvcG5mcy5o
DQppbmRleCBkNmJlNTI5OWE1NWEuLjE4MTI4M2M0ZWJjMyAxMDA2NDQNCi0tLSBhL2ZzL25mcy9w
bmZzLmgNCisrKyBiL2ZzL25mcy9wbmZzLmgNCkBAIC02MjksNiArNjI5LDEzIEBAIHBuZnNfc3lu
Y19pbm9kZShzdHJ1Y3QgaW5vZGUgKmlub2RlLCBib29sIGRhdGFzeW5jKQ0KIH0NCiANCiBzdGF0
aWMgaW5saW5lIGJvb2wNCitwbmZzX2xheW91dGNvbW1pdF9vdXRzdGFuZGluZyhzdHJ1Y3QgaW5v
ZGUgKmlub2RlKQ0KK3sNCisJcmV0dXJuIGZhbHNlOw0KK30NCisNCisNCitzdGF0aWMgaW5saW5l
IGJvb2wNCiBwbmZzX3JvYyhzdHJ1Y3QgaW5vZGUgKmlubykNCiB7DQogCXJldHVybiBmYWxzZTsN
Ci0tIA0KMi43LjQNCg0KDQrCoMKgDQrCoMKgDQoNCg0KDQoNClRyb25kIE15a2xlYnVzdA0KUHJp
bmNpcGFsIFN5c3RlbSBBcmNoaXRlY3QNCjQzMDAgRWwgQ2FtaW5vIFJlYWwgfCBTdWl0ZSAxMDAN
CkxvcyBBbHRvcywgQ0HCoMKgOTQwMjINClc6IDY1MC00MjItMzgwMA0KQzogODAxLTkyMS00NTgz
wqANCnd3dy5wcmltYXJ5ZGF0YS5jb20NCg0KDQoNCg==


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] NFS: Getattr doesn't require data sync semantics
  2016-07-18  4:59                                                     ` Trond Myklebust
@ 2016-07-19  3:58                                                       ` hch
  2016-07-19 20:00                                                         ` [PATCH v4 24/28] " Benjamin Coddington
  0 siblings, 1 reply; 69+ messages in thread
From: hch @ 2016-07-19  3:58 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: hch, linux-nfs

On Mon, Jul 18, 2016 at 04:59:09AM +0000, Trond Myklebust wrote:
> Actually... The problem might be that a previous attribute update is
> marking the attribute cache as being revalidated. Does the following
> patch help?

It doesn't.  Also with your most recent linux-next branch the test
now cause the systems to OOM with or without your patch (with mine it's
still fine).  I tested with your writeback branch from about two or
three days ago before, and with that + your patch it also 'just fails'
and doesn't OOM.  Looks like whatever causes the bug also creates
a temporarily memory leak when combined with recent changes from your
tree, most likely something from the pnfs branch.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-19  3:58                                                       ` hch
@ 2016-07-19 20:00                                                         ` Benjamin Coddington
  2016-07-19 20:06                                                           ` Trond Myklebust
  2016-07-19 20:09                                                           ` Benjamin Coddington
  0 siblings, 2 replies; 69+ messages in thread
From: Benjamin Coddington @ 2016-07-19 20:00 UTC (permalink / raw)
  To: hch; +Cc: Trond Myklebust, linux-nfs

On 18 Jul 2016, at 23:58, hch@infradead.org wrote:

> On Mon, Jul 18, 2016 at 04:59:09AM +0000, Trond Myklebust wrote:
>> Actually... The problem might be that a previous attribute update is
>> marking the attribute cache as being revalidated. Does the following
>> patch help?
>
> It doesn't.  Also with your most recent linux-next branch the test
> now cause the systems to OOM with or without your patch (with mine 
> it's
> still fine).  I tested with your writeback branch from about two or
> three days ago before, and with that + your patch it also 'just fails'
> and doesn't OOM.  Looks like whatever causes the bug also creates
> a temporarily memory leak when combined with recent changes from your
> tree, most likely something from the pnfs branch.

I couldn't find the memory leak using kmemleak, but it OOMs pretty 
quick.  If I
insert an mdelay(200) just after the lookup_again: marker in
pnfs_update_layout() it doesn't OOM, but it seems stuck forever in a 
loop on
that marker:

[ 1230.635586] pnfs_find_alloc_layout Begin ino=ffff88003ef986f8 
layout=ffff8800392bca58
[ 1230.636729] pnfs_find_lseg:Begin
[ 1230.637538] pnfs_find_lseg:Return lseg           (null) ref 0
[ 1230.638582] --> send_layoutget
[ 1230.639499] --> nfs4_proc_layoutget
[ 1230.640525] --> nfs4_layoutget_prepare
[ 1230.641479] --> nfs41_setup_sequence
[ 1230.641581] <-- nfs4_proc_layoutget status=-512
[ 1230.643288] --> nfs4_alloc_slot used_slots=0000 
highest_used=4294967295 max_slots=31
[ 1230.644348] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 
slotid=0
[ 1230.645373] <-- nfs41_setup_sequence slotid=0 seqid=4376
[ 1230.646356] <-- nfs4_layoutget_prepare
[ 1230.647357] encode_sequence: sessionid=1468956665:2:3:0 seqid=4376 
slotid=0 max_slotid=0 cache_this=0
[ 1230.648522] encode_layoutget: 1st type:0x5 iomode:2 off:122880 
len:4096 mc:4096
[ 1230.650182] decode_layoutget roff:122880 rlen:4096 riomode:2, 
lo_type:0x5, lo.len:48
[ 1230.651331] --> nfs4_layoutget_done
[ 1230.652233] --> nfs4_alloc_slot used_slots=0001 highest_used=0 
max_slots=31
[ 1230.653409] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 
slotid=1
[ 1230.654547] nfs4_free_slot: slotid 1 highest_used_slotid 0
[ 1230.655606] nfs41_sequence_done: Error 0 free the slot
[ 1230.656635] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
[ 1230.657739] <-- nfs4_layoutget_done
[ 1230.658650] --> nfs4_layoutget_release
[ 1230.659626] <-- nfs4_layoutget_release

This debug output is identical for every cycle of the loop. Have to stop 
for the
day.. more tomorrow.

Ben

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-19 20:00                                                         ` [PATCH v4 24/28] " Benjamin Coddington
@ 2016-07-19 20:06                                                           ` Trond Myklebust
  2016-07-20 15:03                                                             ` Benjamin Coddington
  2016-07-19 20:09                                                           ` Benjamin Coddington
  1 sibling, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-19 20:06 UTC (permalink / raw)
  To: Coddington Benjamin; +Cc: hch, List Linux


> On Jul 19, 2016, at 16:00, Benjamin Coddington <bcodding@redhat.com> wrote:
> 
> On 18 Jul 2016, at 23:58, hch@infradead.org wrote:
> 
>> On Mon, Jul 18, 2016 at 04:59:09AM +0000, Trond Myklebust wrote:
>>> Actually... The problem might be that a previous attribute update is
>>> marking the attribute cache as being revalidated. Does the following
>>> patch help?
>> 
>> It doesn't.  Also with your most recent linux-next branch the test
>> now cause the systems to OOM with or without your patch (with mine it's
>> still fine).  I tested with your writeback branch from about two or
>> three days ago before, and with that + your patch it also 'just fails'
>> and doesn't OOM.  Looks like whatever causes the bug also creates
>> a temporarily memory leak when combined with recent changes from your
>> tree, most likely something from the pnfs branch.
> 
> I couldn't find the memory leak using kmemleak, but it OOMs pretty quick.  If I
> insert an mdelay(200) just after the lookup_again: marker in
> pnfs_update_layout() it doesn't OOM, but it seems stuck forever in a loop on
> that marker:
> 
> [ 1230.635586] pnfs_find_alloc_layout Begin ino=ffff88003ef986f8 layout=ffff8800392bca58
> [ 1230.636729] pnfs_find_lseg:Begin
> [ 1230.637538] pnfs_find_lseg:Return lseg           (null) ref 0
> [ 1230.638582] --> send_layoutget
> [ 1230.639499] --> nfs4_proc_layoutget
> [ 1230.640525] --> nfs4_layoutget_prepare
> [ 1230.641479] --> nfs41_setup_sequence
> [ 1230.641581] <-- nfs4_proc_layoutget status=-512
> [ 1230.643288] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=31
> [ 1230.644348] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
> [ 1230.645373] <-- nfs41_setup_sequence slotid=0 seqid=4376
> [ 1230.646356] <-- nfs4_layoutget_prepare
> [ 1230.647357] encode_sequence: sessionid=1468956665:2:3:0 seqid=4376 slotid=0 max_slotid=0 cache_this=0
> [ 1230.648522] encode_layoutget: 1st type:0x5 iomode:2 off:122880 len:4096 mc:4096
> [ 1230.650182] decode_layoutget roff:122880 rlen:4096 riomode:2, lo_type:0x5, lo.len:48
> [ 1230.651331] --> nfs4_layoutget_done
> [ 1230.652233] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=31
> [ 1230.653409] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
> [ 1230.654547] nfs4_free_slot: slotid 1 highest_used_slotid 0
> [ 1230.655606] nfs41_sequence_done: Error 0 free the slot
> [ 1230.656635] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> [ 1230.657739] <-- nfs4_layoutget_done
> [ 1230.658650] --> nfs4_layoutget_release
> [ 1230.659626] <-- nfs4_layoutget_release
> 
> This debug output is identical for every cycle of the loop. Have to stop for the
> day.. more tomorrow.
> 
> Ben
> 

Duh… It’s this patch: pNFS: Fix post-layoutget error handling in pnfs_update_layout()
We have to pass through fatal errors… I’ll fix it.

Cheers
  Trond


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-19 20:00                                                         ` [PATCH v4 24/28] " Benjamin Coddington
  2016-07-19 20:06                                                           ` Trond Myklebust
@ 2016-07-19 20:09                                                           ` Benjamin Coddington
  1 sibling, 0 replies; 69+ messages in thread
From: Benjamin Coddington @ 2016-07-19 20:09 UTC (permalink / raw)
  To: hch; +Cc: Trond Myklebust, linux-nfs



On 19 Jul 2016, at 16:00, Benjamin Coddington wrote:

> On 18 Jul 2016, at 23:58, hch@infradead.org wrote:
>
>> On Mon, Jul 18, 2016 at 04:59:09AM +0000, Trond Myklebust wrote:
>>> Actually... The problem might be that a previous attribute update is
>>> marking the attribute cache as being revalidated. Does the following
>>> patch help?
>>
>> It doesn't.  Also with your most recent linux-next branch the test
>> now cause the systems to OOM with or without your patch (with mine 
>> it's
>> still fine).  I tested with your writeback branch from about two or
>> three days ago before, and with that + your patch it also 'just 
>> fails'
>> and doesn't OOM.  Looks like whatever causes the bug also creates
>> a temporarily memory leak when combined with recent changes from your
>> tree, most likely something from the pnfs branch.
>
> I couldn't find the memory leak using kmemleak, but it OOMs pretty 
> quick.  If I
> insert an mdelay(200) just after the lookup_again: marker in
> pnfs_update_layout() it doesn't OOM, but it seems stuck forever in a 
> loop on
> that marker:
>
> [ 1230.635586] pnfs_find_alloc_layout Begin ino=ffff88003ef986f8 
> layout=ffff8800392bca58
> [ 1230.636729] pnfs_find_lseg:Begin
> [ 1230.637538] pnfs_find_lseg:Return lseg           (null) ref 0
> [ 1230.638582] --> send_layoutget
> [ 1230.639499] --> nfs4_proc_layoutget
> [ 1230.640525] --> nfs4_layoutget_prepare
> [ 1230.641479] --> nfs41_setup_sequence
> [ 1230.641581] <-- nfs4_proc_layoutget status=-512
> [ 1230.643288] --> nfs4_alloc_slot used_slots=0000 
> highest_used=4294967295 max_slots=31
> [ 1230.644348] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 
> slotid=0
> [ 1230.645373] <-- nfs41_setup_sequence slotid=0 seqid=4376
> [ 1230.646356] <-- nfs4_layoutget_prepare
> [ 1230.647357] encode_sequence: sessionid=1468956665:2:3:0 seqid=4376 
> slotid=0 max_slotid=0 cache_this=0
> [ 1230.648522] encode_layoutget: 1st type:0x5 iomode:2 off:122880 
> len:4096 mc:4096
> [ 1230.650182] decode_layoutget roff:122880 rlen:4096 riomode:2, 
> lo_type:0x5, lo.len:48
> [ 1230.651331] --> nfs4_layoutget_done
> [ 1230.652233] --> nfs4_alloc_slot used_slots=0001 highest_used=0 
> max_slots=31
> [ 1230.653409] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 
> slotid=1
> [ 1230.654547] nfs4_free_slot: slotid 1 highest_used_slotid 0
> [ 1230.655606] nfs41_sequence_done: Error 0 free the slot
> [ 1230.656635] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> [ 1230.657739] <-- nfs4_layoutget_done
> [ 1230.658650] --> nfs4_layoutget_release
> [ 1230.659626] <-- nfs4_layoutget_release
>
> This debug output is identical for every cycle of the loop.

Except for the monotonically incrementing sequence id!  sorry..  :/

Ben


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-19 20:06                                                           ` Trond Myklebust
@ 2016-07-20 15:03                                                             ` Benjamin Coddington
  2016-07-21  8:22                                                               ` hch
  0 siblings, 1 reply; 69+ messages in thread
From: Benjamin Coddington @ 2016-07-20 15:03 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: hch, List Linux

On 19 Jul 2016, at 16:06, Trond Myklebust wrote:

>> On Jul 19, 2016, at 16:00, Benjamin Coddington <bcodding@redhat.com> 
>> wrote:
>>
>> On 18 Jul 2016, at 23:58, hch@infradead.org wrote:
>>
>>> On Mon, Jul 18, 2016 at 04:59:09AM +0000, Trond Myklebust wrote:
>>>> Actually... The problem might be that a previous attribute update 
>>>> is
>>>> marking the attribute cache as being revalidated. Does the 
>>>> following
>>>> patch help?
>>>
>>> It doesn't.  Also with your most recent linux-next branch the test
>>> now cause the systems to OOM with or without your patch (with mine 
>>> it's
>>> still fine).  I tested with your writeback branch from about two or
>>> three days ago before, and with that + your patch it also 'just 
>>> fails'
>>> and doesn't OOM.  Looks like whatever causes the bug also creates
>>> a temporarily memory leak when combined with recent changes from 
>>> your
>>> tree, most likely something from the pnfs branch.
>>
>> I couldn't find the memory leak using kmemleak, but it OOMs pretty 
>> quick.  If I
>> insert an mdelay(200) just after the lookup_again: marker in
>> pnfs_update_layout() it doesn't OOM, but it seems stuck forever in a 
>> loop on
>> that marker:
>>
>> [ 1230.635586] pnfs_find_alloc_layout Begin ino=ffff88003ef986f8 
>> layout=ffff8800392bca58
>> [ 1230.636729] pnfs_find_lseg:Begin
>> [ 1230.637538] pnfs_find_lseg:Return lseg           (null) ref 0
>> [ 1230.638582] --> send_layoutget
>> [ 1230.639499] --> nfs4_proc_layoutget
>> [ 1230.640525] --> nfs4_layoutget_prepare
>> [ 1230.641479] --> nfs41_setup_sequence
>> [ 1230.641581] <-- nfs4_proc_layoutget status=-512
>> [ 1230.643288] --> nfs4_alloc_slot used_slots=0000 
>> highest_used=4294967295 max_slots=31
>> [ 1230.644348] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 
>> slotid=0
>> [ 1230.645373] <-- nfs41_setup_sequence slotid=0 seqid=4376
>> [ 1230.646356] <-- nfs4_layoutget_prepare
>> [ 1230.647357] encode_sequence: sessionid=1468956665:2:3:0 seqid=4376 
>> slotid=0 max_slotid=0 cache_this=0
>> [ 1230.648522] encode_layoutget: 1st type:0x5 iomode:2 off:122880 
>> len:4096 mc:4096
>> [ 1230.650182] decode_layoutget roff:122880 rlen:4096 riomode:2, 
>> lo_type:0x5, lo.len:48
>> [ 1230.651331] --> nfs4_layoutget_done
>> [ 1230.652233] --> nfs4_alloc_slot used_slots=0001 highest_used=0 
>> max_slots=31
>> [ 1230.653409] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 
>> slotid=1
>> [ 1230.654547] nfs4_free_slot: slotid 1 highest_used_slotid 0
>> [ 1230.655606] nfs41_sequence_done: Error 0 free the slot
>> [ 1230.656635] nfs4_free_slot: slotid 0 highest_used_slotid 
>> 4294967295
>> [ 1230.657739] <-- nfs4_layoutget_done
>> [ 1230.658650] --> nfs4_layoutget_release
>> [ 1230.659626] <-- nfs4_layoutget_release
>>
>> This debug output is identical for every cycle of the loop. Have to 
>> stop for the
>> day.. more tomorrow.
>>
>> Ben
>>
>
> Duh… It’s this patch: pNFS: Fix post-layoutget error handling in 
> pnfs_update_layout()
> We have to pass through fatal errors… I’ll fix it.

That's indeed fixed it up, and generic/207 passes now.  Thanks!

Ben

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-20 15:03                                                             ` Benjamin Coddington
@ 2016-07-21  8:22                                                               ` hch
  2016-07-21  8:32                                                                 ` Benjamin Coddington
  0 siblings, 1 reply; 69+ messages in thread
From: hch @ 2016-07-21  8:22 UTC (permalink / raw)
  To: Benjamin Coddington; +Cc: Trond Myklebust, hch, List Linux

On Wed, Jul 20, 2016 at 11:03:06AM -0400, Benjamin Coddington wrote:
> > > This debug output is identical for every cycle of the loop. Have to
> > > stop for the
> > > day.. more tomorrow.
> > > 
> > > Ben
> > > 
> > 
> > Duh??? It???s this patch: pNFS: Fix post-layoutget error handling in
> > pnfs_update_layout()
> > We have to pass through fatal errors??? I???ll fix it.
> 
> That's indeed fixed it up, and generic/207 passes now.  Thanks!

So I spoke too soon in my last mail, generic/207 still fails for me
with Trond's linux-next tree, although much later in the test now.

Does it include the changes that are supposed to fix the issue?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-21  8:22                                                               ` hch
@ 2016-07-21  8:32                                                                 ` Benjamin Coddington
  2016-07-21  9:10                                                                   ` Benjamin Coddington
  0 siblings, 1 reply; 69+ messages in thread
From: Benjamin Coddington @ 2016-07-21  8:32 UTC (permalink / raw)
  To: hch; +Cc: Trond Myklebust, List Linux

On 21 Jul 2016, at 4:22, hch@infradead.org wrote:

> On Wed, Jul 20, 2016 at 11:03:06AM -0400, Benjamin Coddington wrote:
>>>> This debug output is identical for every cycle of the loop. Have to
>>>> stop for the
>>>> day.. more tomorrow.
>>>>
>>>> Ben
>>>>
>>>
>>> Duh??? It???s this patch: pNFS: Fix post-layoutget error handling in
>>> pnfs_update_layout()
>>> We have to pass through fatal errors??? I???ll fix it.
>>
>> That's indeed fixed it up, and generic/207 passes now.  Thanks!
>
> So I spoke too soon in my last mail, generic/207 still fails for me
> with Trond's linux-next tree, although much later in the test now.
>
> Does it include the changes that are supposed to fix the issue?

It should -- the v2 that fixed 207 for me is 
56b38a1f7c781519eef09c1668a3c97ea911f86b, the first version was 
e35c2a0b3cd062a8941d21511719391b64437427, I think.  I'll test again too.

Ben

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-21  8:32                                                                 ` Benjamin Coddington
@ 2016-07-21  9:10                                                                   ` Benjamin Coddington
  2016-07-21  9:52                                                                     ` Benjamin Coddington
  0 siblings, 1 reply; 69+ messages in thread
From: Benjamin Coddington @ 2016-07-21  9:10 UTC (permalink / raw)
  To: hch; +Cc: Trond Myklebust, List Linux

On 21 Jul 2016, at 4:32, Benjamin Coddington wrote:

> On 21 Jul 2016, at 4:22, hch@infradead.org wrote:
>
>> On Wed, Jul 20, 2016 at 11:03:06AM -0400, Benjamin Coddington wrote:
>>>>> This debug output is identical for every cycle of the loop. Have 
>>>>> to
>>>>> stop for the
>>>>> day.. more tomorrow.
>>>>>
>>>>> Ben
>>>>>
>>>>
>>>> Duh??? It???s this patch: pNFS: Fix post-layoutget error handling 
>>>> in
>>>> pnfs_update_layout()
>>>> We have to pass through fatal errors??? I???ll fix it.
>>>
>>> That's indeed fixed it up, and generic/207 passes now.  Thanks!
>>
>> So I spoke too soon in my last mail, generic/207 still fails for me
>> with Trond's linux-next tree, although much later in the test now.
>>
>> Does it include the changes that are supposed to fix the issue?
>
> It should -- the v2 that fixed 207 for me is 
> 56b38a1f7c781519eef09c1668a3c97ea911f86b, the first version was 
> e35c2a0b3cd062a8941d21511719391b64437427, I think.  I'll test again 
> too.

Looks like we're back to the original problem - it fails with the inode 
size is 4k less than expected.

The reason it worked for me was I had pnfs_ld debugging turned up which 
slowed things down enough to somehow catch the right size.

Looks like the right size is returned in the CLOSE, but the inode's not 
getting updated.

Ben

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-21  9:10                                                                   ` Benjamin Coddington
@ 2016-07-21  9:52                                                                     ` Benjamin Coddington
  2016-07-21 12:46                                                                       ` Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Benjamin Coddington @ 2016-07-21  9:52 UTC (permalink / raw)
  To: hch; +Cc: Trond Myklebust, List Linux

On 21 Jul 2016, at 5:10, Benjamin Coddington wrote:

> On 21 Jul 2016, at 4:32, Benjamin Coddington wrote:
>
>> On 21 Jul 2016, at 4:22, hch@infradead.org wrote:
>>
>>> On Wed, Jul 20, 2016 at 11:03:06AM -0400, Benjamin Coddington wrote:
>>>>>> This debug output is identical for every cycle of the loop. Have 
>>>>>> to
>>>>>> stop for the
>>>>>> day.. more tomorrow.
>>>>>>
>>>>>> Ben
>>>>>>
>>>>>
>>>>> Duh??? It???s this patch: pNFS: Fix post-layoutget error handling 
>>>>> in
>>>>> pnfs_update_layout()
>>>>> We have to pass through fatal errors??? I???ll fix it.
>>>>
>>>> That's indeed fixed it up, and generic/207 passes now.  Thanks!
>>>
>>> So I spoke too soon in my last mail, generic/207 still fails for me
>>> with Trond's linux-next tree, although much later in the test now.
>>>
>>> Does it include the changes that are supposed to fix the issue?
>>
>> It should -- the v2 that fixed 207 for me is 
>> 56b38a1f7c781519eef09c1668a3c97ea911f86b, the first version was 
>> e35c2a0b3cd062a8941d21511719391b64437427, I think.  I'll test again 
>> too.
>
> Looks like we're back to the original problem - it fails with the 
> inode size is 4k less than expected.
>
> The reason it worked for me was I had pnfs_ld debugging turned up 
> which slowed things down enough to somehow catch the right size.
>
> Looks like the right size is returned in the CLOSE, but the inode's 
> not getting updated.

And the size is right in the last LAYOUTCOMMIT response, of course.  Is 
this the problem?

6024 static int decode_layoutcommit(struct xdr_stream *xdr,
...
6040     sizechanged = be32_to_cpup(p);
6041
6042     if (sizechanged) {
6043         /* throw away new size */
6044         p = xdr_inline_decode(xdr, 8);


Ben

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-21  9:52                                                                     ` Benjamin Coddington
@ 2016-07-21 12:46                                                                       ` Trond Myklebust
  2016-07-21 13:05                                                                         ` Benjamin Coddington
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-21 12:46 UTC (permalink / raw)
  To: Coddington Benjamin; +Cc: hch, List Linux


> On Jul 21, 2016, at 05:52, Benjamin Coddington <bcodding@redhat.com> wrot=
e:
>=20
> On 21 Jul 2016, at 5:10, Benjamin Coddington wrote:
>=20
>> On 21 Jul 2016, at 4:32, Benjamin Coddington wrote:
>>=20
>>> On 21 Jul 2016, at 4:22, hch@infradead.org wrote:
>>>=20
>>>> On Wed, Jul 20, 2016 at 11:03:06AM -0400, Benjamin Coddington wrote:
>>>>>>> This debug output is identical for every cycle of the loop. Have to
>>>>>>> stop for the
>>>>>>> day.. more tomorrow.
>>>>>>>=20
>>>>>>> Ben
>>>>>>>=20
>>>>>>=20
>>>>>> Duh??? It???s this patch: pNFS: Fix post-layoutget error handling in
>>>>>> pnfs_update_layout()
>>>>>> We have to pass through fatal errors??? I???ll fix it.
>>>>>=20
>>>>> That's indeed fixed it up, and generic/207 passes now.  Thanks!
>>>>=20
>>>> So I spoke too soon in my last mail, generic/207 still fails for me
>>>> with Trond's linux-next tree, although much later in the test now.
>>>>=20
>>>> Does it include the changes that are supposed to fix the issue?
>>>=20
>>> It should -- the v2 that fixed 207 for me is 56b38a1f7c781519eef09c1668=
a3c97ea911f86b, the first version was e35c2a0b3cd062a8941d21511719391b64437=
427, I think.  I'll test again too.
>>=20
>> Looks like we're back to the original problem - it fails with the inode =
size is 4k less than expected.
>>=20
>> The reason it worked for me was I had pnfs_ld debugging turned up which =
slowed things down enough to somehow catch the right size.
>>=20
>> Looks like the right size is returned in the CLOSE, but the inode's not =
getting updated.
>=20
> And the size is right in the last LAYOUTCOMMIT response, of course.  Is t=
his the problem?
>=20
> 6024 static int decode_layoutcommit(struct xdr_stream *xdr,
> ...
> 6040     sizechanged =3D be32_to_cpup(p);
> 6041
> 6042     if (sizechanged) {
> 6043         /* throw away new size */
> 6044         p =3D xdr_inline_decode(xdr, 8);
>=20
>=20
> Ben
>=20

That shouldn=92t really matter since we have a GETATTR immediately followin=
g the LAYOUTGET operation. Assuming that nfs4_proc_layoutcommit() actually =
gets called, it is supposed to update the inode correctly.

Cheers
  Trond

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-21 12:46                                                                       ` Trond Myklebust
@ 2016-07-21 13:05                                                                         ` Benjamin Coddington
  2016-07-21 13:20                                                                           ` Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Benjamin Coddington @ 2016-07-21 13:05 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: hch, List Linux

On 21 Jul 2016, at 8:46, Trond Myklebust wrote:

>> On Jul 21, 2016, at 05:52, Benjamin Coddington <bcodding@redhat.com> 
>> wrote:
>>
>> On 21 Jul 2016, at 5:10, Benjamin Coddington wrote:
>>
>>> On 21 Jul 2016, at 4:32, Benjamin Coddington wrote:
>>>
>>>> On 21 Jul 2016, at 4:22, hch@infradead.org wrote:
>>>>
>>>>> On Wed, Jul 20, 2016 at 11:03:06AM -0400, Benjamin Coddington 
>>>>> wrote:
>>>>>>>> This debug output is identical for every cycle of the loop. 
>>>>>>>> Have to
>>>>>>>> stop for the
>>>>>>>> day.. more tomorrow.
>>>>>>>>
>>>>>>>> Ben
>>>>>>>>
>>>>>>>
>>>>>>> Duh??? It???s this patch: pNFS: Fix post-layoutget error 
>>>>>>> handling in
>>>>>>> pnfs_update_layout()
>>>>>>> We have to pass through fatal errors??? I???ll fix it.
>>>>>>
>>>>>> That's indeed fixed it up, and generic/207 passes now.  Thanks!
>>>>>
>>>>> So I spoke too soon in my last mail, generic/207 still failjs for 
>>>>> me
>>>>> with Trond's linux-next tree, although much later in the test now.
>>>>>
>>>>> Does it include the changes that are supposed to fix the issue?
>>>>
>>>> It should -- the v2 that fixed 207 for me is 
>>>> 56b38a1f7c781519eef09c1668a3c97ea911f86b, the first version was 
>>>> e35c2a0b3cd062a8941d21511719391b64437427, I think.  I'll test again 
>>>> too.
>>>
>>> Looks like we're back to the original problem - it fails with the 
>>> inode size is 4k less than expected.
>>>
>>> The reason it worked for me was I had pnfs_ld debugging turned up 
>>> which slowed things down enough to somehow catch the right size.
>>>
>>> Looks like the right size is returned in the CLOSE, but the inode's 
>>> not getting updated.
>>
>> And the size is right in the last LAYOUTCOMMIT response, of course.  
>> Is this the problem?
>>
>> 6024 static int decode_layoutcommit(struct xdr_stream *xdr,
>> ...
>> 6040     sizechanged = be32_to_cpup(p);
>> 6041
>> 6042     if (sizechanged) {
>> 6043         /* throw away new size */
>> 6044         p = xdr_inline_decode(xdr, 8);
>>
>>
>> Ben
>>
>
> That shouldn’t really matter since we have a GETATTR immediately 
> following the LAYOUTGET operation. Assuming that 
> nfs4_proc_layoutcommit() actually gets called, it is supposed to 
> update the inode correctly.

So back to Christoph's point earlier:

On 17 Jul 2016, at 23:48, Christoph Hellwig wrote:
> This one breaks xfstests generic/207 on block/scsi layout for me.  The
> reason for that is that we need a layoutcommit after writing out all
> data for the file for the file size to be updated on the server.

You responded:

On 18 Jul 2016, at 0:32, Trond Myklebust wrote:
> I’m not understanding this argument. Why do we care if the file size 
> is up
> to date on the server if we’re not sending an actual GETATTR on the 
> wire
> to retrieve the file size?

I guess the answer might be because we can get it back from the last
LAYOUTCOMMIT.

This test has repeated appending 4k and has this pattern on the wire:

NFS 334 V4 Call LAYOUTGET
NFS 290 V4 Reply (Call In 854) LAYOUTCOMMIT
NFS 294 V4 Call GETATTR FH: 0x4f5528b0
NFS 442 V4 Reply (Call In 858) GETATTR
NFS 374 V4 Call LAYOUTCOMMIT
NFS 314 V4 Reply (Call In 856) LAYOUTGET
NFS 334 V4 Call LAYOUTGET
NFS 290 V4 Reply (Call In 860) LAYOUTCOMMIT
NFS 294 V4 Call GETATTR FH: 0x4f5528b0
NFS 442 V4 Reply (Call In 864) GETATTR
NFS 374 V4 Call LAYOUTCOMMIT
NFS 314 V4 Reply (Call In 862) LAYOUTGET
NFS 334 V4 Call LAYOUTGET
NFS 290 V4 Reply (Call In 866) LAYOUTCOMMIT
NFS 294 V4 Call GETATTR FH: 0x4f5528b0
NFS 442 V4 Reply (Call In 870) GETATTR
NFS 314 V4 Reply (Call In 868) LAYOUTGET
NFS 374 V4 Call LAYOUTCOMMIT
NFS 290 V4 Reply (Call In 874) LAYOUTCOMMIT
NFS 314 V4 Call CLOSE StateID: 0x54d9
NFS 294 V4 Reply (Call In 876) CLOSE

That last LAYOUTCOMMIT and the CLOSE have the size we want.  The 
previous
GETATTR is 4k short.

Ben

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-21 13:05                                                                         ` Benjamin Coddington
@ 2016-07-21 13:20                                                                           ` Trond Myklebust
  2016-07-21 14:00                                                                             ` Trond Myklebust
                                                                                               ` (2 more replies)
  0 siblings, 3 replies; 69+ messages in thread
From: Trond Myklebust @ 2016-07-21 13:20 UTC (permalink / raw)
  To: Coddington Benjamin; +Cc: hch, List Linux

DQo+IE9uIEp1bCAyMSwgMjAxNiwgYXQgMDk6MDUsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2Rk
aW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPiANCj4gU28gYmFjayB0byBDaHJpc3RvcGgncyBwb2lu
dCBlYXJsaWVyOg0KPiANCj4gT24gMTcgSnVsIDIwMTYsIGF0IDIzOjQ4LCBDaHJpc3RvcGggSGVs
bHdpZyB3cm90ZToNCj4+IFRoaXMgb25lIGJyZWFrcyB4ZnN0ZXN0cyBnZW5lcmljLzIwNyBvbiBi
bG9jay9zY3NpIGxheW91dCBmb3IgbWUuICBUaGUNCj4+IHJlYXNvbiBmb3IgdGhhdCBpcyB0aGF0
IHdlIG5lZWQgYSBsYXlvdXRjb21taXQgYWZ0ZXIgd3JpdGluZyBvdXQgYWxsDQo+PiBkYXRhIGZv
ciB0aGUgZmlsZSBmb3IgdGhlIGZpbGUgc2l6ZSB0byBiZSB1cGRhdGVkIG9uIHRoZSBzZXJ2ZXIu
DQo+IA0KPiBZb3UgcmVzcG9uZGVkOg0KPiANCj4gT24gMTggSnVsIDIwMTYsIGF0IDA6MzIsIFRy
b25kIE15a2xlYnVzdCB3cm90ZToNCj4+IEnigJltIG5vdCB1bmRlcnN0YW5kaW5nIHRoaXMgYXJn
dW1lbnQuIFdoeSBkbyB3ZSBjYXJlIGlmIHRoZSBmaWxlIHNpemUgaXMgdXANCj4+IHRvIGRhdGUg
b24gdGhlIHNlcnZlciBpZiB3ZeKAmXJlIG5vdCBzZW5kaW5nIGFuIGFjdHVhbCBHRVRBVFRSIG9u
IHRoZSB3aXJlDQo+PiB0byByZXRyaWV2ZSB0aGUgZmlsZSBzaXplPw0KPiANCj4gSSBndWVzcyB0
aGUgYW5zd2VyIG1pZ2h0IGJlIGJlY2F1c2Ugd2UgY2FuIGdldCBpdCBiYWNrIGZyb20gdGhlIGxh
c3QNCj4gTEFZT1VUQ09NTUlULg0KPiANCg0KVGhlIHBhdGNoIHRoYXQgSSBmb2xsb3dlZCB1cCB3
aXRoIHNob3VsZCBub3cgZW5zdXJlIHRoYXQgd2UgZG8gbm90IG1hcmsgdGhlIGF0dHJpYnV0ZSBj
YWNoZSBhcyB1cCB0byBkYXRlIGlmIHRoZXJlIGlzIGEgTEFZT1VUQ09NTUlUIHBlbmRpbmcuDQpJ
T1c6IHdoZW4gdGhlIHBORlMgd3JpdGUgaXMgZG9uZSwgaXQgaXMgZXhwZWN0ZWQgdG8gZG8gMiB0
aGluZ3M6DQoNCjEpIG1hcmsgdGhlIGlub2RlIGZvciBMQVlPVVRDT01NSVQNCjIpIG1hcmsgdGhl
IGF0dHJpYnV0ZSBjYWNoZSBhcyBpbnZhbGlkIChiZWNhdXNlIHdlIGtub3cgdGhlIGNoYW5nZSBh
dHRyaWJ1dGUsIG10aW1lLCBjdGltZSBuZWVkIHRvIGJlIHVwZGF0ZXMpDQoNCkluIHRoZSBjYXNl
IG9mIGJsb2NrcyBwTkZTIHdyaXRlOg0KVGhlIGNhbGwgdG8gcG5mc19zZXRfbGF5b3V0Y29tbWl0
KCkgaW4gcG5mc19sZF93cml0ZV9kb25lKCkgc2hvdWxkIHRha2UgY2FyZSBvZiAoMSkNClRoZSBj
YWxsIHRvIG5mc193cml0ZWJhY2tfdXBkYXRlX2lub2RlKCkgaW4gbmZzNF93cml0ZV9kb25lX2Ni
KCkgc2hvdWxkIHRha2UgY2FyZSBvZiAoMikuDQoNClByb3ZpZGVkIHRoYXQgdGhlc2UgMiBjYWxs
cyBhcmUgcGVyZm9ybWVkIGluIHRoZSBhYm92ZSBvcmRlciwgdGhlbiBhbnkgY2FsbCB0byBuZnNf
Z2V0YXR0cigpIHdoaWNoIGhhcyBub3QgYmVlbiBwcmVjZWRlZCBieSBhIGNhbGwgdG8gbmZzNF9w
cm9jX2xheW91dGNvbW1pdCgpIHNob3VsZCB0cmlnZ2VyIHRoZSBjYWxsIHRvIF9fbmZzX3JldmFs
aWRhdGVfaW5vZGUoKS4NCg0KPiBUaGlzIHRlc3QgaGFzIHJlcGVhdGVkIGFwcGVuZGluZyA0ayBh
bmQgaGFzIHRoaXMgcGF0dGVybiBvbiB0aGUgd2lyZToNCj4gDQo+IE5GUyAzMzQgVjQgQ2FsbCBM
QVlPVVRHRVQNCj4gTkZTIDI5MCBWNCBSZXBseSAoQ2FsbCBJbiA4NTQpIExBWU9VVENPTU1JVA0K
PiBORlMgMjk0IFY0IENhbGwgR0VUQVRUUiBGSDogMHg0ZjU1MjhiMA0KPiBORlMgNDQyIFY0IFJl
cGx5IChDYWxsIEluIDg1OCkgR0VUQVRUUg0KPiBORlMgMzc0IFY0IENhbGwgTEFZT1VUQ09NTUlU
DQo+IE5GUyAzMTQgVjQgUmVwbHkgKENhbGwgSW4gODU2KSBMQVlPVVRHRVQNCj4gTkZTIDMzNCBW
NCBDYWxsIExBWU9VVEdFVA0KPiBORlMgMjkwIFY0IFJlcGx5IChDYWxsIEluIDg2MCkgTEFZT1VU
Q09NTUlUDQo+IE5GUyAyOTQgVjQgQ2FsbCBHRVRBVFRSIEZIOiAweDRmNTUyOGIwDQo+IE5GUyA0
NDIgVjQgUmVwbHkgKENhbGwgSW4gODY0KSBHRVRBVFRSDQo+IE5GUyAzNzQgVjQgQ2FsbCBMQVlP
VVRDT01NSVQNCj4gTkZTIDMxNCBWNCBSZXBseSAoQ2FsbCBJbiA4NjIpIExBWU9VVEdFVA0KPiBO
RlMgMzM0IFY0IENhbGwgTEFZT1VUR0VUDQo+IE5GUyAyOTAgVjQgUmVwbHkgKENhbGwgSW4gODY2
KSBMQVlPVVRDT01NSVQNCj4gTkZTIDI5NCBWNCBDYWxsIEdFVEFUVFIgRkg6IDB4NGY1NTI4YjAN
Cj4gTkZTIDQ0MiBWNCBSZXBseSAoQ2FsbCBJbiA4NzApIEdFVEFUVFINCj4gTkZTIDMxNCBWNCBS
ZXBseSAoQ2FsbCBJbiA4NjgpIExBWU9VVEdFVA0KPiBORlMgMzc0IFY0IENhbGwgTEFZT1VUQ09N
TUlUDQo+IE5GUyAyOTAgVjQgUmVwbHkgKENhbGwgSW4gODc0KSBMQVlPVVRDT01NSVQNCj4gTkZT
IDMxNCBWNCBDYWxsIENMT1NFIFN0YXRlSUQ6IDB4NTRkOQ0KPiBORlMgMjk0IFY0IFJlcGx5IChD
YWxsIEluIDg3NikgQ0xPU0UNCj4gDQo+IFRoYXQgbGFzdCBMQVlPVVRDT01NSVQgYW5kIHRoZSBD
TE9TRSBoYXZlIHRoZSBzaXplIHdlIHdhbnQuICBUaGUgcHJldmlvdXMNCj4gR0VUQVRUUiBpcyA0
ayBzaG9ydC4NCg0KV2hlbiB5b3Ugc2F5IOKAnDRrIHNob3J04oCdLCBkbyB5b3UgbWVhbiB0aGF0
IGl0IGRpZmZlcnMgZnJvbSB0aGUgdmFsdWUgcmV0dXJuZWQgYnkgdGhlIExBWU9VVENPTU1JVCB0
aGF0IGltbWVkaWF0ZWx5IHByZWNlZGVzIGl0PyBJdCBsb29rcyBhcyBpZiB0aGVyZSBpcyBhIExB
WU9VVEdFVCBpbW1lZGlhdGVseSBmb2xsb3dpbmcgaXQsIHNvIHByZXN1bWFibHkgdGhlIHdyaXRl
cyBoYXZlIG5vdCBhbGwgY29tcGxldGVkIHlldC4NCg0KQ2hlZXJzDQogIFRyb25kDQoNCg==

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-21 13:20                                                                           ` Trond Myklebust
@ 2016-07-21 14:00                                                                             ` Trond Myklebust
  2016-07-21 14:02                                                                             ` Benjamin Coddington
  2016-07-25 16:26                                                                             ` Benjamin Coddington
  2 siblings, 0 replies; 69+ messages in thread
From: Trond Myklebust @ 2016-07-21 14:00 UTC (permalink / raw)
  To: Coddington Benjamin; +Cc: hch, List Linux

DQo+IE9uIEp1bCAyMSwgMjAxNiwgYXQgMDk6MjAsIFRyb25kIE15a2xlYnVzdCA8dHJvbmRteUBw
cmltYXJ5ZGF0YS5jb20+IHdyb3RlOg0KPiANCj4gDQo+PiBPbiBKdWwgMjEsIDIwMTYsIGF0IDA5
OjA1LCBCZW5qYW1pbiBDb2RkaW5ndG9uIDxiY29kZGluZ0ByZWRoYXQuY29tPiB3cm90ZToNCj4+
IA0KPj4gU28gYmFjayB0byBDaHJpc3RvcGgncyBwb2ludCBlYXJsaWVyOg0KPj4gDQo+PiBPbiAx
NyBKdWwgMjAxNiwgYXQgMjM6NDgsIENocmlzdG9waCBIZWxsd2lnIHdyb3RlOg0KPj4+IFRoaXMg
b25lIGJyZWFrcyB4ZnN0ZXN0cyBnZW5lcmljLzIwNyBvbiBibG9jay9zY3NpIGxheW91dCBmb3Ig
bWUuICBUaGUNCj4+PiByZWFzb24gZm9yIHRoYXQgaXMgdGhhdCB3ZSBuZWVkIGEgbGF5b3V0Y29t
bWl0IGFmdGVyIHdyaXRpbmcgb3V0IGFsbA0KPj4+IGRhdGEgZm9yIHRoZSBmaWxlIGZvciB0aGUg
ZmlsZSBzaXplIHRvIGJlIHVwZGF0ZWQgb24gdGhlIHNlcnZlci4NCj4+IA0KPj4gWW91IHJlc3Bv
bmRlZDoNCj4+IA0KPj4gT24gMTggSnVsIDIwMTYsIGF0IDA6MzIsIFRyb25kIE15a2xlYnVzdCB3
cm90ZToNCj4+PiBJ4oCZbSBub3QgdW5kZXJzdGFuZGluZyB0aGlzIGFyZ3VtZW50LiBXaHkgZG8g
d2UgY2FyZSBpZiB0aGUgZmlsZSBzaXplIGlzIHVwDQo+Pj4gdG8gZGF0ZSBvbiB0aGUgc2VydmVy
IGlmIHdl4oCZcmUgbm90IHNlbmRpbmcgYW4gYWN0dWFsIEdFVEFUVFIgb24gdGhlIHdpcmUNCj4+
PiB0byByZXRyaWV2ZSB0aGUgZmlsZSBzaXplPw0KPj4gDQo+PiBJIGd1ZXNzIHRoZSBhbnN3ZXIg
bWlnaHQgYmUgYmVjYXVzZSB3ZSBjYW4gZ2V0IGl0IGJhY2sgZnJvbSB0aGUgbGFzdA0KPj4gTEFZ
T1VUQ09NTUlULg0KPj4gDQo+IA0KPiBUaGUgcGF0Y2ggdGhhdCBJIGZvbGxvd2VkIHVwIHdpdGgg
c2hvdWxkIG5vdyBlbnN1cmUgdGhhdCB3ZSBkbyBub3QgbWFyayB0aGUgYXR0cmlidXRlIGNhY2hl
IGFzIHVwIHRvIGRhdGUgaWYgdGhlcmUgaXMgYSBMQVlPVVRDT01NSVQgcGVuZGluZy4NCj4gSU9X
OiB3aGVuIHRoZSBwTkZTIHdyaXRlIGlzIGRvbmUsIGl0IGlzIGV4cGVjdGVkIHRvIGRvIDIgdGhp
bmdzOg0KPiANCj4gMSkgbWFyayB0aGUgaW5vZGUgZm9yIExBWU9VVENPTU1JVA0KPiAyKSBtYXJr
IHRoZSBhdHRyaWJ1dGUgY2FjaGUgYXMgaW52YWxpZCAoYmVjYXVzZSB3ZSBrbm93IHRoZSBjaGFu
Z2UgYXR0cmlidXRlLCBtdGltZSwgY3RpbWUgbmVlZCB0byBiZSB1cGRhdGVzKQ0KPiANCj4gSW4g
dGhlIGNhc2Ugb2YgYmxvY2tzIHBORlMgd3JpdGU6DQo+IFRoZSBjYWxsIHRvIHBuZnNfc2V0X2xh
eW91dGNvbW1pdCgpIGluIHBuZnNfbGRfd3JpdGVfZG9uZSgpIHNob3VsZCB0YWtlIGNhcmUgb2Yg
KDEpDQo+IFRoZSBjYWxsIHRvIG5mc193cml0ZWJhY2tfdXBkYXRlX2lub2RlKCkgaW4gbmZzNF93
cml0ZV9kb25lX2NiKCkgc2hvdWxkIHRha2UgY2FyZSBvZiAoMikuDQo+IA0KPiBQcm92aWRlZCB0
aGF0IHRoZXNlIDIgY2FsbHMgYXJlIHBlcmZvcm1lZCBpbiB0aGUgYWJvdmUgb3JkZXIsIHRoZW4g
YW55IGNhbGwgdG8gbmZzX2dldGF0dHIoKSB3aGljaCBoYXMgbm90IGJlZW4gcHJlY2VkZWQgYnkg
YSBjYWxsIHRvIG5mczRfcHJvY19sYXlvdXRjb21taXQoKSBzaG91bGQgdHJpZ2dlciB0aGUgY2Fs
bCB0byBfX25mc19yZXZhbGlkYXRlX2lub2RlKCkuDQoNCkJ5IHRoZSB3YXksIGl0IGxvb2tzIGFz
IGlmIHRoZSDigJhmaWxlc+KAmSBsYXlvdXQgdHlwZSBmYWlscyB0byBkbyAoMikuIEnigJlsbCBh
ZGQgYSBmaXggZm9yIHRoYXQuDQoNCkNoZWVycw0KICBUcm9uZA==

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-21 13:20                                                                           ` Trond Myklebust
  2016-07-21 14:00                                                                             ` Trond Myklebust
@ 2016-07-21 14:02                                                                             ` Benjamin Coddington
  2016-07-25 16:26                                                                             ` Benjamin Coddington
  2 siblings, 0 replies; 69+ messages in thread
From: Benjamin Coddington @ 2016-07-21 14:02 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: hch, List Linux

On 21 Jul 2016, at 9:20, Trond Myklebust wrote:

>> On Jul 21, 2016, at 09:05, Benjamin Coddington <bcodding@redhat.com> 
>> wrote:
>>
>> So back to Christoph's point earlier:
>>
>> On 17 Jul 2016, at 23:48, Christoph Hellwig wrote:
>>> This one breaks xfstests generic/207 on block/scsi layout for me.  
>>> The
>>> reason for that is that we need a layoutcommit after writing out all
>>> data for the file for the file size to be updated on the server.
>>
>> You responded:
>>
>> On 18 Jul 2016, at 0:32, Trond Myklebust wrote:
>>> I’m not understanding this argument. Why do we care if the file 
>>> size is up
>>> to date on the server if we’re not sending an actual GETATTR on 
>>> the wire
>>> to retrieve the file size?
>>
>> I guess the answer might be because we can get it back from the last
>> LAYOUTCOMMIT.
>>
>
> The patch that I followed up with should now ensure that we do not 
> mark the attribute cache as up to date if there is a LAYOUTCOMMIT 
> pending.
> IOW: when the pNFS write is done, it is expected to do 2 things:
>
> 1) mark the inode for LAYOUTCOMMIT
> 2) mark the attribute cache as invalid (because we know the change 
> attribute, mtime, ctime need to be updates)
>
> In the case of blocks pNFS write:
> The call to pnfs_set_layoutcommit() in pnfs_ld_write_done() should 
> take care of (1)
> The call to nfs_writeback_update_inode() in nfs4_write_done_cb() 
> should take care of (2).
>
> Provided that these 2 calls are performed in the above order, then any 
> call to nfs_getattr() which has not been preceded by a call to 
> nfs4_proc_layoutcommit() should trigger the call to 
> __nfs_revalidate_inode().

OK, so maybe things are out of order here..  Thanks - this is helpful.

>> This test has repeated appending 4k and has this pattern on the wire:
>>
>> NFS 334 V4 Call LAYOUTGET
>> NFS 290 V4 Reply (Call In 854) LAYOUTCOMMIT
>> NFS 294 V4 Call GETATTR FH: 0x4f5528b0
>> NFS 442 V4 Reply (Call In 858) GETATTR
>> NFS 374 V4 Call LAYOUTCOMMIT
>> NFS 314 V4 Reply (Call In 856) LAYOUTGET
>> NFS 334 V4 Call LAYOUTGET
>> NFS 290 V4 Reply (Call In 860) LAYOUTCOMMIT
>> NFS 294 V4 Call GETATTR FH: 0x4f5528b0
>> NFS 442 V4 Reply (Call In 864) GETATTR
>> NFS 374 V4 Call LAYOUTCOMMIT
>> NFS 314 V4 Reply (Call In 862) LAYOUTGET
>> NFS 334 V4 Call LAYOUTGET
>> NFS 290 V4 Reply (Call In 866) LAYOUTCOMMIT
>> NFS 294 V4 Call GETATTR FH: 0x4f5528b0
>> NFS 442 V4 Reply (Call In 870) GETATTR
>> NFS 314 V4 Reply (Call In 868) LAYOUTGET
>> NFS 374 V4 Call LAYOUTCOMMIT
>> NFS 290 V4 Reply (Call In 874) LAYOUTCOMMIT
>> NFS 314 V4 Call CLOSE StateID: 0x54d9
>> NFS 294 V4 Reply (Call In 876) CLOSE
>>
>> That last LAYOUTCOMMIT and the CLOSE have the size we want.  The 
>> previous
>> GETATTR is 4k short.
>
> When you say “4k short”, do you mean that it differs from the 
> value returned by the LAYOUTCOMMIT that immediately precedes it? It 
> looks as if there is a LAYOUTGET immediately following it, so 
> presumably the writes have not all completed yet.

I meant it is 4k short of the final size returned in the LAYOUTCOMMIT
immediately following.  It is the same as the LAYOUTCOMMIT immediately
preceding.

FWIW at this point, here's a better look at the network:

tshark -r /tmp/pcap -T fields -e frame.number -e nfs.fh.hash -e 
nfs.opcode -e nfs.fattr4.size 'frame.number > 860'
861     53,22,50
862 0x4f5528b0  53,22,50
863     53,22,49,9  409600
864 0x4f5528b0  53,22,9
865     53,22,9 409600
866 0x4f5528b0  53,22,49,9
867     53,22,50
868 0x4f5528b0  53,22,50
869     53,22,49,9  413696
870 0x4f5528b0  53,22,9
871     53,22,9 413696
872     53,22,50
873
874 0x4f5528b0  53,22,49,9
875     53,22,49,9  417792
876 0x4f5528b0  53,22,4,9
877     53,22,4,9   417792
...
880 0x4f5528b0  53,22,9
881     53,22,9 417792

Now I see there's a GETATTR after the CLOSE that returns the right size 
-- but
I'll really should track down what's happening to that one to see if it 
is the
same call that the test is making.  Unfortunately, I'm getting pulled 
away
again, so I'll dig more later.  Thanks for looking.

Ben


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-21 13:20                                                                           ` Trond Myklebust
  2016-07-21 14:00                                                                             ` Trond Myklebust
  2016-07-21 14:02                                                                             ` Benjamin Coddington
@ 2016-07-25 16:26                                                                             ` Benjamin Coddington
  2016-07-25 16:39                                                                               ` Trond Myklebust
  2 siblings, 1 reply; 69+ messages in thread
From: Benjamin Coddington @ 2016-07-25 16:26 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: hch, List Linux

On 21 Jul 2016, at 9:20, Trond Myklebust wrote:

>> On Jul 21, 2016, at 09:05, Benjamin Coddington <bcodding@redhat.com> 
>> wrote:
>>
>> So back to Christoph's point earlier:
>>
>> On 17 Jul 2016, at 23:48, Christoph Hellwig wrote:
>>> This one breaks xfstests generic/207 on block/scsi layout for me.  
>>> The
>>> reason for that is that we need a layoutcommit after writing out all
>>> data for the file for the file size to be updated on the server.
>>
>> You responded:
>>
>> On 18 Jul 2016, at 0:32, Trond Myklebust wrote:
>>> I’m not understanding this argument. Why do we care if the file 
>>> size is up
>>> to date on the server if we’re not sending an actual GETATTR on 
>>> the wire
>>> to retrieve the file size?
>>
>> I guess the answer might be because we can get it back from the last
>> LAYOUTCOMMIT.
>>
>
> The patch that I followed up with should now ensure that we do not 
> mark the attribute cache as up to date if there is a LAYOUTCOMMIT 
> pending.
> IOW: when the pNFS write is done, it is expected to do 2 things:
>
> 1) mark the inode for LAYOUTCOMMIT
> 2) mark the attribute cache as invalid (because we know the change 
> attribute, mtime, ctime need to be updates)
>
> In the case of blocks pNFS write:
> The call to pnfs_set_layoutcommit() in pnfs_ld_write_done() should 
> take care of (1)
> The call to nfs_writeback_update_inode() in nfs4_write_done_cb() 
> should take care of (2).
>
> Provided that these 2 calls are performed in the above order, then any 
> call to nfs_getattr() which has not been preceded by a call to 
> nfs4_proc_layoutcommit() should trigger the call to 
> __nfs_revalidate_inode().

I think the problem is that a following nfs_getattr() will fail to 
notice
the size change in the case of a write_completion and layoutcommit 
occuring
after nfs_getattr() has done pnfs_sync_inode() but before it has done
nfs_update_inode().

In the failing case there are two threads one is doing writes, the other
doing lstat on aio_complete via io_getevents(2).

For each write completion the lstat thread tries to verify the file 
size.

GETATTR Thread                  LAYOUTCOMMIT Thread
--------------                  --------------------
                                 write_completion sets LAYOUTCOMMIT 
(4096@0)
--> nfs_getattr
  __nfs_revalidate_inode
   pnfs_sync_inode
   getattr sees 4096
                                 write_completion sets LAYOUTCOMMIT 
(4096@4096)
                                 sets LAYOUTCOMMITING
                                 clears LAYOUTCOMMIT
                                 clears LAYOUTCOMMITTING
   nfs_refresh_inode
    nfs_update_inode size is 4096
<-- nfs_getattr

At this point the cached attributes are seen as up to date, but
aio-dio-extend-stat program expects that second write_completion to 
reflect
in the file size.

Ben

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-25 16:26                                                                             ` Benjamin Coddington
@ 2016-07-25 16:39                                                                               ` Trond Myklebust
  2016-07-25 18:26                                                                                 ` Benjamin Coddington
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-25 16:39 UTC (permalink / raw)
  To: Coddington Benjamin; +Cc: hch, List Linux NFS Mailing

DQo+IE9uIEp1bCAyNSwgMjAxNiwgYXQgMTI6MjYsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2Rk
aW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPiANCj4gT24gMjEgSnVsIDIwMTYsIGF0IDk6MjAsIFRy
b25kIE15a2xlYnVzdCB3cm90ZToNCj4gDQo+Pj4gT24gSnVsIDIxLCAyMDE2LCBhdCAwOTowNSwg
QmVuamFtaW4gQ29kZGluZ3RvbiA8YmNvZGRpbmdAcmVkaGF0LmNvbT4gd3JvdGU6DQo+Pj4gDQo+
Pj4gU28gYmFjayB0byBDaHJpc3RvcGgncyBwb2ludCBlYXJsaWVyOg0KPj4+IA0KPj4+IE9uIDE3
IEp1bCAyMDE2LCBhdCAyMzo0OCwgQ2hyaXN0b3BoIEhlbGx3aWcgd3JvdGU6DQo+Pj4+IFRoaXMg
b25lIGJyZWFrcyB4ZnN0ZXN0cyBnZW5lcmljLzIwNyBvbiBibG9jay9zY3NpIGxheW91dCBmb3Ig
bWUuICBUaGUNCj4+Pj4gcmVhc29uIGZvciB0aGF0IGlzIHRoYXQgd2UgbmVlZCBhIGxheW91dGNv
bW1pdCBhZnRlciB3cml0aW5nIG91dCBhbGwNCj4+Pj4gZGF0YSBmb3IgdGhlIGZpbGUgZm9yIHRo
ZSBmaWxlIHNpemUgdG8gYmUgdXBkYXRlZCBvbiB0aGUgc2VydmVyLg0KPj4+IA0KPj4+IFlvdSBy
ZXNwb25kZWQ6DQo+Pj4gDQo+Pj4gT24gMTggSnVsIDIwMTYsIGF0IDA6MzIsIFRyb25kIE15a2xl
YnVzdCB3cm90ZToNCj4+Pj4gSeKAmW0gbm90IHVuZGVyc3RhbmRpbmcgdGhpcyBhcmd1bWVudC4g
V2h5IGRvIHdlIGNhcmUgaWYgdGhlIGZpbGUgc2l6ZSBpcyB1cA0KPj4+PiB0byBkYXRlIG9uIHRo
ZSBzZXJ2ZXIgaWYgd2XigJlyZSBub3Qgc2VuZGluZyBhbiBhY3R1YWwgR0VUQVRUUiBvbiB0aGUg
d2lyZQ0KPj4+PiB0byByZXRyaWV2ZSB0aGUgZmlsZSBzaXplPw0KPj4+IA0KPj4+IEkgZ3Vlc3Mg
dGhlIGFuc3dlciBtaWdodCBiZSBiZWNhdXNlIHdlIGNhbiBnZXQgaXQgYmFjayBmcm9tIHRoZSBs
YXN0DQo+Pj4gTEFZT1VUQ09NTUlULg0KPj4+IA0KPj4gDQo+PiBUaGUgcGF0Y2ggdGhhdCBJIGZv
bGxvd2VkIHVwIHdpdGggc2hvdWxkIG5vdyBlbnN1cmUgdGhhdCB3ZSBkbyBub3QgbWFyayB0aGUg
YXR0cmlidXRlIGNhY2hlIGFzIHVwIHRvIGRhdGUgaWYgdGhlcmUgaXMgYSBMQVlPVVRDT01NSVQg
cGVuZGluZy4NCj4+IElPVzogd2hlbiB0aGUgcE5GUyB3cml0ZSBpcyBkb25lLCBpdCBpcyBleHBl
Y3RlZCB0byBkbyAyIHRoaW5nczoNCj4+IA0KPj4gMSkgbWFyayB0aGUgaW5vZGUgZm9yIExBWU9V
VENPTU1JVA0KPj4gMikgbWFyayB0aGUgYXR0cmlidXRlIGNhY2hlIGFzIGludmFsaWQgKGJlY2F1
c2Ugd2Uga25vdyB0aGUgY2hhbmdlIGF0dHJpYnV0ZSwgbXRpbWUsIGN0aW1lIG5lZWQgdG8gYmUg
dXBkYXRlcykNCj4+IA0KPj4gSW4gdGhlIGNhc2Ugb2YgYmxvY2tzIHBORlMgd3JpdGU6DQo+PiBU
aGUgY2FsbCB0byBwbmZzX3NldF9sYXlvdXRjb21taXQoKSBpbiBwbmZzX2xkX3dyaXRlX2RvbmUo
KSBzaG91bGQgdGFrZSBjYXJlIG9mICgxKQ0KPj4gVGhlIGNhbGwgdG8gbmZzX3dyaXRlYmFja191
cGRhdGVfaW5vZGUoKSBpbiBuZnM0X3dyaXRlX2RvbmVfY2IoKSBzaG91bGQgdGFrZSBjYXJlIG9m
ICgyKS4NCj4+IA0KPj4gUHJvdmlkZWQgdGhhdCB0aGVzZSAyIGNhbGxzIGFyZSBwZXJmb3JtZWQg
aW4gdGhlIGFib3ZlIG9yZGVyLCB0aGVuIGFueSBjYWxsIHRvIG5mc19nZXRhdHRyKCkgd2hpY2gg
aGFzIG5vdCBiZWVuIHByZWNlZGVkIGJ5IGEgY2FsbCB0byBuZnM0X3Byb2NfbGF5b3V0Y29tbWl0
KCkgc2hvdWxkIHRyaWdnZXIgdGhlIGNhbGwgdG8gX19uZnNfcmV2YWxpZGF0ZV9pbm9kZSgpLg0K
PiANCj4gSSB0aGluayB0aGUgcHJvYmxlbSBpcyB0aGF0IGEgZm9sbG93aW5nIG5mc19nZXRhdHRy
KCkgd2lsbCBmYWlsIHRvIG5vdGljZQ0KPiB0aGUgc2l6ZSBjaGFuZ2UgaW4gdGhlIGNhc2Ugb2Yg
YSB3cml0ZV9jb21wbGV0aW9uIGFuZCBsYXlvdXRjb21taXQgb2NjdXJpbmcNCj4gYWZ0ZXIgbmZz
X2dldGF0dHIoKSBoYXMgZG9uZSBwbmZzX3N5bmNfaW5vZGUoKSBidXQgYmVmb3JlIGl0IGhhcyBk
b25lDQo+IG5mc191cGRhdGVfaW5vZGUoKS4NCj4gDQo+IEluIHRoZSBmYWlsaW5nIGNhc2UgdGhl
cmUgYXJlIHR3byB0aHJlYWRzIG9uZSBpcyBkb2luZyB3cml0ZXMsIHRoZSBvdGhlcg0KPiBkb2lu
ZyBsc3RhdCBvbiBhaW9fY29tcGxldGUgdmlhIGlvX2dldGV2ZW50cygyKS4NCj4gDQo+IEZvciBl
YWNoIHdyaXRlIGNvbXBsZXRpb24gdGhlIGxzdGF0IHRocmVhZCB0cmllcyB0byB2ZXJpZnkgdGhl
IGZpbGUgc2l6ZS4NCj4gDQo+IEdFVEFUVFIgVGhyZWFkICAgICAgICAgICAgICAgICAgTEFZT1VU
Q09NTUlUIFRocmVhZA0KPiAtLS0tLS0tLS0tLS0tLSAgICAgICAgICAgICAgICAgIC0tLS0tLS0t
LS0tLS0tLS0tLS0tDQo+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICB3cml0ZV9jb21w
bGV0aW9uIHNldHMgTEFZT1VUQ09NTUlUICg0MDk2QDApDQo+IC0tPiBuZnNfZ2V0YXR0cg0KDQpm
aWxlbWFwX3dyaXRlX2FuZF93YWl0KCkNCg0KPiBfX25mc19yZXZhbGlkYXRlX2lub2RlDQo+ICBw
bmZzX3N5bmNfaW5vZGUNCg0KTkZTX1BST1RPKGlub2RlKS0+Z2V0YXR0cigpDQoNCj4gIGdldGF0
dHIgc2VlcyA0MDk2DQo+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICB3cml0ZV9jb21w
bGV0aW9uIHNldHMgTEFZT1VUQ09NTUlUICg0MDk2QDQwOTYpDQo+ICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICBzZXRzIExBWU9VVENPTU1JVElORw0KPiAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgY2xlYXJzIExBWU9VVENPTU1JVA0KPiAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgY2xlYXJzIExBWU9VVENPTU1JVFRJTkcNCj4gIG5mc19yZWZyZXNoX2lub2RlDQo+
ICAgbmZzX3VwZGF0ZV9pbm9kZSBzaXplIGlzIDQwOTYNCj4gPC0tIG5mc19nZXRhdHRyDQo+IA0K
PiBBdCB0aGlzIHBvaW50IHRoZSBjYWNoZWQgYXR0cmlidXRlcyBhcmUgc2VlbiBhcyB1cCB0byBk
YXRlLCBidXQNCj4gYWlvLWRpby1leHRlbmQtc3RhdCBwcm9ncmFtIGV4cGVjdHMgdGhhdCBzZWNv
bmQgd3JpdGVfY29tcGxldGlvbiB0byByZWZsZWN0DQo+IGluIHRoZSBmaWxlIHNpemUuDQo+IA0K
DQpXaHkgaXNu4oCZdCB0aGUgZmlsZW1hcF93cml0ZV9hbmRfd2FpdCgpIGFib3ZlIHJlc29sdmlu
ZyB0aGUgcmFjZT8gSeKAmWQgZXhwZWN0IHRoYXQgd291bGQgbW92ZSB5b3VyIOKAnHdyaXRlIGNv
bXBsZXRpb24gc2V0cyBMQVlPVVRDT01NSVTigJ0gdXAgdG8gYmVmb3JlIHRoZSBwbmZzX3N5bmNf
aW5vZGUoKS4NCkluIGZhY3QsIGluIHRoZSBwYXRjaCB0aGF0IENocmlzdG9waCBzZW50LCBhbGwg
aGUgd2FzIGRvaW5nIHdhcyBtb3ZpbmcgdGhlIHBuZnNfc3luY19pbm9kZSgpIHRvIGltbWVkaWF0
ZWx5IGFmdGVyIHRoYXQgZmlsZW1hcF93cml0ZV9hbmRfd2FpdCgpIGluc3RlYWQgb2YgcmVseWlu
ZyBvbiBpdCBpbiBfbmZzX3JldmFsaWRhdGVfaW5vZGUu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-25 16:39                                                                               ` Trond Myklebust
@ 2016-07-25 18:26                                                                                 ` Benjamin Coddington
  2016-07-25 18:34                                                                                   ` Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Benjamin Coddington @ 2016-07-25 18:26 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: hch, List Linux NFS Mailing



On 25 Jul 2016, at 12:39, Trond Myklebust wrote:

>> On Jul 25, 2016, at 12:26, Benjamin Coddington <bcodding@redhat.com> 
>> wrote:
>>
>> On 21 Jul 2016, at 9:20, Trond Myklebust wrote:
>>
>>>> On Jul 21, 2016, at 09:05, Benjamin Coddington 
>>>> <bcodding@redhat.com> wrote:
>>>>
>>>> So back to Christoph's point earlier:
>>>>
>>>> On 17 Jul 2016, at 23:48, Christoph Hellwig wrote:
>>>>> This one breaks xfstests generic/207 on block/scsi layout for me.  
>>>>> The
>>>>> reason for that is that we need a layoutcommit after writing out 
>>>>> all
>>>>> data for the file for the file size to be updated on the server.
>>>>
>>>> You responded:
>>>>
>>>> On 18 Jul 2016, at 0:32, Trond Myklebust wrote:
>>>>> I’m not understanding this argument. Why do we care if the file 
>>>>> size is up
>>>>> to date on the server if we’re not sending an actual GETATTR on 
>>>>> the wire
>>>>> to retrieve the file size?
>>>>
>>>> I guess the answer might be because we can get it back from the 
>>>> last
>>>> LAYOUTCOMMIT.
>>>>
>>>
>>> The patch that I followed up with should now ensure that we do not 
>>> mark the attribute cache as up to date if there is a LAYOUTCOMMIT 
>>> pending.
>>> IOW: when the pNFS write is done, it is expected to do 2 things:
>>>
>>> 1) mark the inode for LAYOUTCOMMIT
>>> 2) mark the attribute cache as invalid (because we know the change 
>>> attribute, mtime, ctime need to be updates)
>>>
>>> In the case of blocks pNFS write:
>>> The call to pnfs_set_layoutcommit() in pnfs_ld_write_done() should 
>>> take care of (1)
>>> The call to nfs_writeback_update_inode() in nfs4_write_done_cb() 
>>> should take care of (2).
>>>
>>> Provided that these 2 calls are performed in the above order, then 
>>> any call to nfs_getattr() which has not been preceded by a call to 
>>> nfs4_proc_layoutcommit() should trigger the call to 
>>> __nfs_revalidate_inode().
>>
>> I think the problem is that a following nfs_getattr() will fail to 
>> notice
>> the size change in the case of a write_completion and layoutcommit 
>> occuring
>> after nfs_getattr() has done pnfs_sync_inode() but before it has done
>> nfs_update_inode().
>>
>> In the failing case there are two threads one is doing writes, the 
>> other
>> doing lstat on aio_complete via io_getevents(2).
>>
>> For each write completion the lstat thread tries to verify the file 
>> size.
>>
>> GETATTR Thread                  LAYOUTCOMMIT Thread
>> --------------                  --------------------
>>                                write_completion sets LAYOUTCOMMIT 
>> (4096@0)
>> --> nfs_getattr
>
> filemap_write_and_wait()
>
>> __nfs_revalidate_inode
>>  pnfs_sync_inode
>
> NFS_PROTO(inode)->getattr()
>
>>  getattr sees 4096
>>                                write_completion sets LAYOUTCOMMIT 
>> (4096@4096)
>>                                sets LAYOUTCOMMITING
>>                                clears LAYOUTCOMMIT
>>                                clears LAYOUTCOMMITTING
>>  nfs_refresh_inode
>>   nfs_update_inode size is 4096
>> <-- nfs_getattr
>>
>> At this point the cached attributes are seen as up to date, but
>> aio-dio-extend-stat program expects that second write_completion to 
>> reflect
>> in the file size.
>>
>
> Why isn’t the filemap_write_and_wait() above resolving the race? 
> I’d
> expect that would move your “write completion sets LAYOUTCOMMIT” 
> up to
> before the pnfs_sync_inode().  In fact, in the patch that Christoph 
> sent,
> all he was doing was moving the pnfs_sync_inode() to immediately after
> that filemap_write_and_wait() instead of relying on it in
> _nfs_revalidate_inode.

This is O_DIRECT, I've failed to mention yet.  The second write hasn't 
made
it out of __nfs_pageio_add_request() at the time 
filemap_write_and_wait() is
called.  It is sleeping in pnfs_update_layout() waiting on a LAYOUTGET 
and it
doesn't resumes until after filemap_write_and_wait().

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-25 18:26                                                                                 ` Benjamin Coddington
@ 2016-07-25 18:34                                                                                   ` Trond Myklebust
  2016-07-25 18:41                                                                                     ` Benjamin Coddington
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-25 18:34 UTC (permalink / raw)
  To: Coddington Benjamin; +Cc: hch, List Linux NFS Mailing

DQo+IE9uIEp1bCAyNSwgMjAxNiwgYXQgMTQ6MjYsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2Rk
aW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPiANCj4gDQo+IA0KPiBPbiAyNSBKdWwgMjAxNiwgYXQg
MTI6MzksIFRyb25kIE15a2xlYnVzdCB3cm90ZToNCj4gDQo+Pj4gT24gSnVsIDI1LCAyMDE2LCBh
dCAxMjoyNiwgQmVuamFtaW4gQ29kZGluZ3RvbiA8YmNvZGRpbmdAcmVkaGF0LmNvbT4gd3JvdGU6
DQo+Pj4gDQo+Pj4gT24gMjEgSnVsIDIwMTYsIGF0IDk6MjAsIFRyb25kIE15a2xlYnVzdCB3cm90
ZToNCj4+PiANCj4+Pj4+IE9uIEp1bCAyMSwgMjAxNiwgYXQgMDk6MDUsIEJlbmphbWluIENvZGRp
bmd0b24gPGJjb2RkaW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPj4+Pj4gDQo+Pj4+PiBTbyBiYWNr
IHRvIENocmlzdG9waCdzIHBvaW50IGVhcmxpZXI6DQo+Pj4+PiANCj4+Pj4+IE9uIDE3IEp1bCAy
MDE2LCBhdCAyMzo0OCwgQ2hyaXN0b3BoIEhlbGx3aWcgd3JvdGU6DQo+Pj4+Pj4gVGhpcyBvbmUg
YnJlYWtzIHhmc3Rlc3RzIGdlbmVyaWMvMjA3IG9uIGJsb2NrL3Njc2kgbGF5b3V0IGZvciBtZS4g
IFRoZQ0KPj4+Pj4+IHJlYXNvbiBmb3IgdGhhdCBpcyB0aGF0IHdlIG5lZWQgYSBsYXlvdXRjb21t
aXQgYWZ0ZXIgd3JpdGluZyBvdXQgYWxsDQo+Pj4+Pj4gZGF0YSBmb3IgdGhlIGZpbGUgZm9yIHRo
ZSBmaWxlIHNpemUgdG8gYmUgdXBkYXRlZCBvbiB0aGUgc2VydmVyLg0KPj4+Pj4gDQo+Pj4+PiBZ
b3UgcmVzcG9uZGVkOg0KPj4+Pj4gDQo+Pj4+PiBPbiAxOCBKdWwgMjAxNiwgYXQgMDozMiwgVHJv
bmQgTXlrbGVidXN0IHdyb3RlOg0KPj4+Pj4+IEnigJltIG5vdCB1bmRlcnN0YW5kaW5nIHRoaXMg
YXJndW1lbnQuIFdoeSBkbyB3ZSBjYXJlIGlmIHRoZSBmaWxlIHNpemUgaXMgdXANCj4+Pj4+PiB0
byBkYXRlIG9uIHRoZSBzZXJ2ZXIgaWYgd2XigJlyZSBub3Qgc2VuZGluZyBhbiBhY3R1YWwgR0VU
QVRUUiBvbiB0aGUgd2lyZQ0KPj4+Pj4+IHRvIHJldHJpZXZlIHRoZSBmaWxlIHNpemU/DQo+Pj4+
PiANCj4+Pj4+IEkgZ3Vlc3MgdGhlIGFuc3dlciBtaWdodCBiZSBiZWNhdXNlIHdlIGNhbiBnZXQg
aXQgYmFjayBmcm9tIHRoZSBsYXN0DQo+Pj4+PiBMQVlPVVRDT01NSVQuDQo+Pj4+PiANCj4+Pj4g
DQo+Pj4+IFRoZSBwYXRjaCB0aGF0IEkgZm9sbG93ZWQgdXAgd2l0aCBzaG91bGQgbm93IGVuc3Vy
ZSB0aGF0IHdlIGRvIG5vdCBtYXJrIHRoZSBhdHRyaWJ1dGUgY2FjaGUgYXMgdXAgdG8gZGF0ZSBp
ZiB0aGVyZSBpcyBhIExBWU9VVENPTU1JVCBwZW5kaW5nLg0KPj4+PiBJT1c6IHdoZW4gdGhlIHBO
RlMgd3JpdGUgaXMgZG9uZSwgaXQgaXMgZXhwZWN0ZWQgdG8gZG8gMiB0aGluZ3M6DQo+Pj4+IA0K
Pj4+PiAxKSBtYXJrIHRoZSBpbm9kZSBmb3IgTEFZT1VUQ09NTUlUDQo+Pj4+IDIpIG1hcmsgdGhl
IGF0dHJpYnV0ZSBjYWNoZSBhcyBpbnZhbGlkIChiZWNhdXNlIHdlIGtub3cgdGhlIGNoYW5nZSBh
dHRyaWJ1dGUsIG10aW1lLCBjdGltZSBuZWVkIHRvIGJlIHVwZGF0ZXMpDQo+Pj4+IA0KPj4+PiBJ
biB0aGUgY2FzZSBvZiBibG9ja3MgcE5GUyB3cml0ZToNCj4+Pj4gVGhlIGNhbGwgdG8gcG5mc19z
ZXRfbGF5b3V0Y29tbWl0KCkgaW4gcG5mc19sZF93cml0ZV9kb25lKCkgc2hvdWxkIHRha2UgY2Fy
ZSBvZiAoMSkNCj4+Pj4gVGhlIGNhbGwgdG8gbmZzX3dyaXRlYmFja191cGRhdGVfaW5vZGUoKSBp
biBuZnM0X3dyaXRlX2RvbmVfY2IoKSBzaG91bGQgdGFrZSBjYXJlIG9mICgyKS4NCj4+Pj4gDQo+
Pj4+IFByb3ZpZGVkIHRoYXQgdGhlc2UgMiBjYWxscyBhcmUgcGVyZm9ybWVkIGluIHRoZSBhYm92
ZSBvcmRlciwgdGhlbiBhbnkgY2FsbCB0byBuZnNfZ2V0YXR0cigpIHdoaWNoIGhhcyBub3QgYmVl
biBwcmVjZWRlZCBieSBhIGNhbGwgdG8gbmZzNF9wcm9jX2xheW91dGNvbW1pdCgpIHNob3VsZCB0
cmlnZ2VyIHRoZSBjYWxsIHRvIF9fbmZzX3JldmFsaWRhdGVfaW5vZGUoKS4NCj4+PiANCj4+PiBJ
IHRoaW5rIHRoZSBwcm9ibGVtIGlzIHRoYXQgYSBmb2xsb3dpbmcgbmZzX2dldGF0dHIoKSB3aWxs
IGZhaWwgdG8gbm90aWNlDQo+Pj4gdGhlIHNpemUgY2hhbmdlIGluIHRoZSBjYXNlIG9mIGEgd3Jp
dGVfY29tcGxldGlvbiBhbmQgbGF5b3V0Y29tbWl0IG9jY3VyaW5nDQo+Pj4gYWZ0ZXIgbmZzX2dl
dGF0dHIoKSBoYXMgZG9uZSBwbmZzX3N5bmNfaW5vZGUoKSBidXQgYmVmb3JlIGl0IGhhcyBkb25l
DQo+Pj4gbmZzX3VwZGF0ZV9pbm9kZSgpLg0KPj4+IA0KPj4+IEluIHRoZSBmYWlsaW5nIGNhc2Ug
dGhlcmUgYXJlIHR3byB0aHJlYWRzIG9uZSBpcyBkb2luZyB3cml0ZXMsIHRoZSBvdGhlcg0KPj4+
IGRvaW5nIGxzdGF0IG9uIGFpb19jb21wbGV0ZSB2aWEgaW9fZ2V0ZXZlbnRzKDIpLg0KPj4+IA0K
Pj4+IEZvciBlYWNoIHdyaXRlIGNvbXBsZXRpb24gdGhlIGxzdGF0IHRocmVhZCB0cmllcyB0byB2
ZXJpZnkgdGhlIGZpbGUgc2l6ZS4NCj4+PiANCj4+PiBHRVRBVFRSIFRocmVhZCAgICAgICAgICAg
ICAgICAgIExBWU9VVENPTU1JVCBUaHJlYWQNCj4+PiAtLS0tLS0tLS0tLS0tLSAgICAgICAgICAg
ICAgICAgIC0tLS0tLS0tLS0tLS0tLS0tLS0tDQo+Pj4gICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgd3JpdGVfY29tcGxldGlvbiBzZXRzIExBWU9VVENPTU1JVCAoNDA5NkAwKQ0KPj4+IC0t
PiBuZnNfZ2V0YXR0cg0KPj4gDQo+PiBmaWxlbWFwX3dyaXRlX2FuZF93YWl0KCkNCj4+IA0KPj4+
IF9fbmZzX3JldmFsaWRhdGVfaW5vZGUNCj4+PiBwbmZzX3N5bmNfaW5vZGUNCj4+IA0KPj4gTkZT
X1BST1RPKGlub2RlKS0+Z2V0YXR0cigpDQo+PiANCj4+PiBnZXRhdHRyIHNlZXMgNDA5Ng0KPj4+
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHdyaXRlX2NvbXBsZXRpb24gc2V0cyBMQVlP
VVRDT01NSVQgKDQwOTZANDA5NikNCj4+PiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBz
ZXRzIExBWU9VVENPTU1JVElORw0KPj4+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGNs
ZWFycyBMQVlPVVRDT01NSVQNCj4+PiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBjbGVh
cnMgTEFZT1VUQ09NTUlUVElORw0KPj4+IG5mc19yZWZyZXNoX2lub2RlDQo+Pj4gIG5mc191cGRh
dGVfaW5vZGUgc2l6ZSBpcyA0MDk2DQo+Pj4gPC0tIG5mc19nZXRhdHRyDQo+Pj4gDQo+Pj4gQXQg
dGhpcyBwb2ludCB0aGUgY2FjaGVkIGF0dHJpYnV0ZXMgYXJlIHNlZW4gYXMgdXAgdG8gZGF0ZSwg
YnV0DQo+Pj4gYWlvLWRpby1leHRlbmQtc3RhdCBwcm9ncmFtIGV4cGVjdHMgdGhhdCBzZWNvbmQg
d3JpdGVfY29tcGxldGlvbiB0byByZWZsZWN0DQo+Pj4gaW4gdGhlIGZpbGUgc2l6ZS4NCj4+PiAN
Cj4+IA0KPj4gV2h5IGlzbuKAmXQgdGhlIGZpbGVtYXBfd3JpdGVfYW5kX3dhaXQoKSBhYm92ZSBy
ZXNvbHZpbmcgdGhlIHJhY2U/IEnigJlkDQo+PiBleHBlY3QgdGhhdCB3b3VsZCBtb3ZlIHlvdXIg
4oCcd3JpdGUgY29tcGxldGlvbiBzZXRzIExBWU9VVENPTU1JVOKAnSB1cCB0bw0KPj4gYmVmb3Jl
IHRoZSBwbmZzX3N5bmNfaW5vZGUoKS4gIEluIGZhY3QsIGluIHRoZSBwYXRjaCB0aGF0IENocmlz
dG9waCBzZW50LA0KPj4gYWxsIGhlIHdhcyBkb2luZyB3YXMgbW92aW5nIHRoZSBwbmZzX3N5bmNf
aW5vZGUoKSB0byBpbW1lZGlhdGVseSBhZnRlcg0KPj4gdGhhdCBmaWxlbWFwX3dyaXRlX2FuZF93
YWl0KCkgaW5zdGVhZCBvZiByZWx5aW5nIG9uIGl0IGluDQo+PiBfbmZzX3JldmFsaWRhdGVfaW5v
ZGUuDQo+IA0KPiBUaGlzIGlzIE9fRElSRUNULCBJJ3ZlIGZhaWxlZCB0byBtZW50aW9uIHlldC4g
IFRoZSBzZWNvbmQgd3JpdGUgaGFzbid0IG1hZGUNCj4gaXQgb3V0IG9mIF9fbmZzX3BhZ2Vpb19h
ZGRfcmVxdWVzdCgpIGF0IHRoZSB0aW1lIGZpbGVtYXBfd3JpdGVfYW5kX3dhaXQoKSBpcw0KPiBj
YWxsZWQuICBJdCBpcyBzbGVlcGluZyBpbiBwbmZzX3VwZGF0ZV9sYXlvdXQoKSB3YWl0aW5nIG9u
IGEgTEFZT1VUR0VUIGFuZCBpdA0KPiBkb2Vzbid0IHJlc3VtZXMgdW50aWwgYWZ0ZXIgZmlsZW1h
cF93cml0ZV9hbmRfd2FpdCgpLg0KDQpXYWl0LCBzbyB5b3UgaGF2ZSAxIHRocmVhZCBkb2luZyBh
biBPX0RJUkVDVCB3cml0ZSgpIGFuZCBhbm90aGVyIGRvaW5nIGEgc3RhdCgpIGluIHBhcmFsbGVs
PyBXaHkgd291bGQgdGhlcmUgYmUgYW4gZXhwZWN0YXRpb24gdGhhdCB0aGUgZmlsZXN5c3RlbSBz
aG91bGQgc2VyaWFsaXNlIHRob3NlIHN5c3RlbSBjYWxscz8NCg0K

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-25 18:34                                                                                   ` Trond Myklebust
@ 2016-07-25 18:41                                                                                     ` Benjamin Coddington
  2016-07-26 16:32                                                                                       ` Benjamin Coddington
  0 siblings, 1 reply; 69+ messages in thread
From: Benjamin Coddington @ 2016-07-25 18:41 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: hch, List Linux NFS Mailing



On 25 Jul 2016, at 14:34, Trond Myklebust wrote:

>> On Jul 25, 2016, at 14:26, Benjamin Coddington <bcodding@redhat.com> 
>> wrote:
>>
>>
>>
>> On 25 Jul 2016, at 12:39, Trond Myklebust wrote:
>>
>>>> On Jul 25, 2016, at 12:26, Benjamin Coddington 
>>>> <bcodding@redhat.com> wrote:
>>>>
>>>> On 21 Jul 2016, at 9:20, Trond Myklebust wrote:
>>>>
>>>>>> On Jul 21, 2016, at 09:05, Benjamin Coddington 
>>>>>> <bcodding@redhat.com> wrote:
>>>>>>
>>>>>> So back to Christoph's point earlier:
>>>>>>
>>>>>> On 17 Jul 2016, at 23:48, Christoph Hellwig wrote:
>>>>>>> This one breaks xfstests generic/207 on block/scsi layout for 
>>>>>>> me.  The
>>>>>>> reason for that is that we need a layoutcommit after writing out 
>>>>>>> all
>>>>>>> data for the file for the file size to be updated on the server.
>>>>>>
>>>>>> You responded:
>>>>>>
>>>>>> On 18 Jul 2016, at 0:32, Trond Myklebust wrote:
>>>>>>> I’m not understanding this argument. Why do we care if the 
>>>>>>> file size is up
>>>>>>> to date on the server if we’re not sending an actual GETATTR 
>>>>>>> on the wire
>>>>>>> to retrieve the file size?
>>>>>>
>>>>>> I guess the answer might be because we can get it back from the 
>>>>>> last
>>>>>> LAYOUTCOMMIT.
>>>>>>
>>>>>
>>>>> The patch that I followed up with should now ensure that we do not 
>>>>> mark the attribute cache as up to date if there is a LAYOUTCOMMIT 
>>>>> pending.
>>>>> IOW: when the pNFS write is done, it is expected to do 2 things:
>>>>>
>>>>> 1) mark the inode for LAYOUTCOMMIT
>>>>> 2) mark the attribute cache as invalid (because we know the change 
>>>>> attribute, mtime, ctime need to be updates)
>>>>>
>>>>> In the case of blocks pNFS write:
>>>>> The call to pnfs_set_layoutcommit() in pnfs_ld_write_done() should 
>>>>> take care of (1)
>>>>> The call to nfs_writeback_update_inode() in nfs4_write_done_cb() 
>>>>> should take care of (2).
>>>>>
>>>>> Provided that these 2 calls are performed in the above order, then 
>>>>> any call to nfs_getattr() which has not been preceded by a call to 
>>>>> nfs4_proc_layoutcommit() should trigger the call to 
>>>>> __nfs_revalidate_inode().
>>>>
>>>> I think the problem is that a following nfs_getattr() will fail to 
>>>> notice
>>>> the size change in the case of a write_completion and layoutcommit 
>>>> occuring
>>>> after nfs_getattr() has done pnfs_sync_inode() but before it has 
>>>> done
>>>> nfs_update_inode().
>>>>
>>>> In the failing case there are two threads one is doing writes, the 
>>>> other
>>>> doing lstat on aio_complete via io_getevents(2).
>>>>
>>>> For each write completion the lstat thread tries to verify the file 
>>>> size.
>>>>
>>>> GETATTR Thread                  LAYOUTCOMMIT Thread
>>>> --------------                  --------------------
>>>>                               write_completion sets LAYOUTCOMMIT 
>>>> (4096@0)
>>>> --> nfs_getattr
>>>
>>> filemap_write_and_wait()
>>>
>>>> __nfs_revalidate_inode
>>>> pnfs_sync_inode
>>>
>>> NFS_PROTO(inode)->getattr()
>>>
>>>> getattr sees 4096
>>>>                               write_completion sets LAYOUTCOMMIT 
>>>> (4096@4096)
>>>>                               sets LAYOUTCOMMITING
>>>>                               clears LAYOUTCOMMIT
>>>>                               clears LAYOUTCOMMITTING
>>>> nfs_refresh_inode
>>>>  nfs_update_inode size is 4096
>>>> <-- nfs_getattr
>>>>
>>>> At this point the cached attributes are seen as up to date, but
>>>> aio-dio-extend-stat program expects that second write_completion to 
>>>> reflect
>>>> in the file size.
>>>>
>>>
>>> Why isn’t the filemap_write_and_wait() above resolving the race? 
>>> I’d
>>> expect that would move your “write completion sets LAYOUTCOMMIT” 
>>> up to
>>> before the pnfs_sync_inode().  In fact, in the patch that Christoph 
>>> sent,
>>> all he was doing was moving the pnfs_sync_inode() to immediately 
>>> after
>>> that filemap_write_and_wait() instead of relying on it in
>>> _nfs_revalidate_inode.
>>
>> This is O_DIRECT, I've failed to mention yet.  The second write 
>> hasn't made
>> it out of __nfs_pageio_add_request() at the time 
>> filemap_write_and_wait() is
>> called.  It is sleeping in pnfs_update_layout() waiting on a 
>> LAYOUTGET and it
>> doesn't resumes until after filemap_write_and_wait().
>
> Wait, so you have 1 thread doing an O_DIRECT write() and another doing 
> a
> stat() in parallel? Why would there be an expectation that the 
> filesystem
> should serialise those system calls?

Not exactly parallel, but synchronized on aio_complete.  A comment in
generic/207's src/aio-dio-regress/aio-dio-extend-stat.c:

  36 /*
  37  * This was originally submitted to
  38  * http://bugzilla.kernel.org/show_bug.cgi?id=6831 by
  39  * Rafal Wijata <wijata@nec-labs.com>.  It caught a race in dio aio 
completion
  40  * that would call aio_complete() before the dio callers would 
update i_size.
  41  * A stat after io_getevents() would not see the new file size.
  42  *
  43  * The bug was fixed in the fs/direct-io.c completion reworking 
that appeared
  44  * in 2.6.20.  This test should fail on 2.6.19.
  45  */

As far as I can see, this check is the whole point of generic/207..

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-25 18:41                                                                                     ` Benjamin Coddington
@ 2016-07-26 16:32                                                                                       ` Benjamin Coddington
  2016-07-26 16:35                                                                                         ` Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Benjamin Coddington @ 2016-07-26 16:32 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: hch, List Linux NFS Mailing

On 25 Jul 2016, at 14:41, Benjamin Coddington wrote:

> On 25 Jul 2016, at 14:34, Trond Myklebust wrote:
>
>>> On Jul 25, 2016, at 14:26, Benjamin Coddington <bcodding@redhat.com> 
>>> wrote:
>>>
>>>
>>>
>>> On 25 Jul 2016, at 12:39, Trond Myklebust wrote:
>>>
>>>>> On Jul 25, 2016, at 12:26, Benjamin Coddington 
>>>>> <bcodding@redhat.com> wrote:
>>>>>
>>>>> On 21 Jul 2016, at 9:20, Trond Myklebust wrote:
>>>>>
>>>>>>> On Jul 21, 2016, at 09:05, Benjamin Coddington 
>>>>>>> <bcodding@redhat.com> wrote:
>>>>>>>
>>>>>>> So back to Christoph's point earlier:
>>>>>>>
>>>>>>> On 17 Jul 2016, at 23:48, Christoph Hellwig wrote:
>>>>>>>> This one breaks xfstests generic/207 on block/scsi layout for 
>>>>>>>> me.  The
>>>>>>>> reason for that is that we need a layoutcommit after writing 
>>>>>>>> out all
>>>>>>>> data for the file for the file size to be updated on the 
>>>>>>>> server.
>>>>>>>
>>>>>>> You responded:
>>>>>>>
>>>>>>> On 18 Jul 2016, at 0:32, Trond Myklebust wrote:
>>>>>>>> I’m not understanding this argument. Why do we care if the 
>>>>>>>> file size is up
>>>>>>>> to date on the server if we’re not sending an actual GETATTR 
>>>>>>>> on the wire
>>>>>>>> to retrieve the file size?
>>>>>>>
>>>>>>> I guess the answer might be because we can get it back from the 
>>>>>>> last
>>>>>>> LAYOUTCOMMIT.
>>>>>>>
>>>>>>
>>>>>> The patch that I followed up with should now ensure that we do 
>>>>>> not mark the attribute cache as up to date if there is a 
>>>>>> LAYOUTCOMMIT pending.
>>>>>> IOW: when the pNFS write is done, it is expected to do 2 things:
>>>>>>
>>>>>> 1) mark the inode for LAYOUTCOMMIT
>>>>>> 2) mark the attribute cache as invalid (because we know the 
>>>>>> change attribute, mtime, ctime need to be updates)
>>>>>>
>>>>>> In the case of blocks pNFS write:
>>>>>> The call to pnfs_set_layoutcommit() in pnfs_ld_write_done() 
>>>>>> should take care of (1)
>>>>>> The call to nfs_writeback_update_inode() in nfs4_write_done_cb() 
>>>>>> should take care of (2).
>>>>>>
>>>>>> Provided that these 2 calls are performed in the above order, 
>>>>>> then any call to nfs_getattr() which has not been preceded by a 
>>>>>> call to nfs4_proc_layoutcommit() should trigger the call to 
>>>>>> __nfs_revalidate_inode().
>>>>>
>>>>> I think the problem is that a following nfs_getattr() will fail to 
>>>>> notice
>>>>> the size change in the case of a write_completion and layoutcommit 
>>>>> occuring
>>>>> after nfs_getattr() has done pnfs_sync_inode() but before it has 
>>>>> done
>>>>> nfs_update_inode().
>>>>>
>>>>> In the failing case there are two threads one is doing writes, the 
>>>>> other
>>>>> doing lstat on aio_complete via io_getevents(2).
>>>>>
>>>>> For each write completion the lstat thread tries to verify the 
>>>>> file size.
>>>>>
>>>>> GETATTR Thread                  LAYOUTCOMMIT Thread
>>>>> --------------                  --------------------
>>>>>                               write_completion sets LAYOUTCOMMIT 
>>>>> (4096@0)
>>>>> --> nfs_getattr
>>>>
>>>> filemap_write_and_wait()
>>>>
>>>>> __nfs_revalidate_inode
>>>>> pnfs_sync_inode
>>>>
>>>> NFS_PROTO(inode)->getattr()
>>>>
>>>>> getattr sees 4096
>>>>>                               write_completion sets LAYOUTCOMMIT 
>>>>> (4096@4096)
>>>>>                               sets LAYOUTCOMMITING
>>>>>                               clears LAYOUTCOMMIT
>>>>>                               clears LAYOUTCOMMITTING
>>>>> nfs_refresh_inode
>>>>>  nfs_update_inode size is 4096
>>>>> <-- nfs_getattr
>>>>>
>>>>> At this point the cached attributes are seen as up to date, but
>>>>> aio-dio-extend-stat program expects that second write_completion 
>>>>> to reflect
>>>>> in the file size.
>>>>>
>>>>
>>>> Why isn’t the filemap_write_and_wait() above resolving the race? 
>>>> I’d
>>>> expect that would move your “write completion sets 
>>>> LAYOUTCOMMIT” up to
>>>> before the pnfs_sync_inode().  In fact, in the patch that Christoph 
>>>> sent,
>>>> all he was doing was moving the pnfs_sync_inode() to immediately 
>>>> after
>>>> that filemap_write_and_wait() instead of relying on it in
>>>> _nfs_revalidate_inode.
>>>
>>> This is O_DIRECT, I've failed to mention yet.  The second write 
>>> hasn't made
>>> it out of __nfs_pageio_add_request() at the time 
>>> filemap_write_and_wait() is
>>> called.  It is sleeping in pnfs_update_layout() waiting on a 
>>> LAYOUTGET and it
>>> doesn't resumes until after filemap_write_and_wait().
>>
>> Wait, so you have 1 thread doing an O_DIRECT write() and another 
>> doing a
>> stat() in parallel? Why would there be an expectation that the 
>> filesystem
>> should serialise those system calls?
>
> Not exactly parallel, but synchronized on aio_complete.  A comment in
> generic/207's src/aio-dio-regress/aio-dio-extend-stat.c:
>
>  36 /*
>  37  * This was originally submitted to
>  38  * http://bugzilla.kernel.org/show_bug.cgi?id=6831 by
>  39  * Rafal Wijata <wijata@nec-labs.com>.  It caught a race in dio 
> aio completion
>  40  * that would call aio_complete() before the dio callers would 
> update i_size.
>  41  * A stat after io_getevents() would not see the new file size.
>  42  *
>  43  * The bug was fixed in the fs/direct-io.c completion reworking 
> that appeared
>  44  * in 2.6.20.  This test should fail on 2.6.19.
>  45  */
>
> As far as I can see, this check is the whole point of generic/207..

This would fix it up:

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index f108d58101f8..823700f827b6 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -661,6 +661,7 @@ int nfs_getattr(struct vfsmount *mnt, struct dentry
*dentry, struct kstat *stat)

         trace_nfs_getattr_enter(inode);
         /* Flush out writes to the server in order to update c/mtime.  
*/
+       nfs_start_io_read(inode);
         if (S_ISREG(inode->i_mode)) {
                 err = filemap_write_and_wait(inode->i_mapping);
                 if (err)
@@ -694,6 +695,7 @@ int nfs_getattr(struct vfsmount *mnt, struct dentry
*dentry, struct kstat *stat)
                         stat->blksize = NFS_SERVER(inode)->dtsize;
         }
  out:
+       nfs_end_io_read(inode);
         trace_nfs_getattr_exit(inode, err);
         return err;
  }

Trond, what do you think?  I'll take any additional silence as a sign to 
go
elsewhere.  :P

Ben

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-26 16:32                                                                                       ` Benjamin Coddington
@ 2016-07-26 16:35                                                                                         ` Trond Myklebust
  2016-07-26 17:57                                                                                           ` Benjamin Coddington
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-26 16:35 UTC (permalink / raw)
  To: Coddington Benjamin; +Cc: hch, List Linux NFS Mailing

DQo+IE9uIEp1bCAyNiwgMjAxNiwgYXQgMTI6MzIsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2Rk
aW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPiANCj4gT24gMjUgSnVsIDIwMTYsIGF0IDE0OjQxLCBC
ZW5qYW1pbiBDb2RkaW5ndG9uIHdyb3RlOg0KPiANCj4+IE9uIDI1IEp1bCAyMDE2LCBhdCAxNDoz
NCwgVHJvbmQgTXlrbGVidXN0IHdyb3RlOg0KPj4gDQo+Pj4+IE9uIEp1bCAyNSwgMjAxNiwgYXQg
MTQ6MjYsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2RkaW5nQHJlZGhhdC5jb20+IHdyb3RlOg0K
Pj4+PiANCj4+Pj4gDQo+Pj4+IA0KPj4+PiBPbiAyNSBKdWwgMjAxNiwgYXQgMTI6MzksIFRyb25k
IE15a2xlYnVzdCB3cm90ZToNCj4+Pj4gDQo+Pj4+Pj4gT24gSnVsIDI1LCAyMDE2LCBhdCAxMjoy
NiwgQmVuamFtaW4gQ29kZGluZ3RvbiA8YmNvZGRpbmdAcmVkaGF0LmNvbT4gd3JvdGU6DQo+Pj4+
Pj4gDQo+Pj4+Pj4gT24gMjEgSnVsIDIwMTYsIGF0IDk6MjAsIFRyb25kIE15a2xlYnVzdCB3cm90
ZToNCj4+Pj4+PiANCj4+Pj4+Pj4+IE9uIEp1bCAyMSwgMjAxNiwgYXQgMDk6MDUsIEJlbmphbWlu
IENvZGRpbmd0b24gPGJjb2RkaW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPj4+Pj4+Pj4gDQo+Pj4+
Pj4+PiBTbyBiYWNrIHRvIENocmlzdG9waCdzIHBvaW50IGVhcmxpZXI6DQo+Pj4+Pj4+PiANCj4+
Pj4+Pj4+IE9uIDE3IEp1bCAyMDE2LCBhdCAyMzo0OCwgQ2hyaXN0b3BoIEhlbGx3aWcgd3JvdGU6
DQo+Pj4+Pj4+Pj4gVGhpcyBvbmUgYnJlYWtzIHhmc3Rlc3RzIGdlbmVyaWMvMjA3IG9uIGJsb2Nr
L3Njc2kgbGF5b3V0IGZvciBtZS4gIFRoZQ0KPj4+Pj4+Pj4+IHJlYXNvbiBmb3IgdGhhdCBpcyB0
aGF0IHdlIG5lZWQgYSBsYXlvdXRjb21taXQgYWZ0ZXIgd3JpdGluZyBvdXQgYWxsDQo+Pj4+Pj4+
Pj4gZGF0YSBmb3IgdGhlIGZpbGUgZm9yIHRoZSBmaWxlIHNpemUgdG8gYmUgdXBkYXRlZCBvbiB0
aGUgc2VydmVyLg0KPj4+Pj4+Pj4gDQo+Pj4+Pj4+PiBZb3UgcmVzcG9uZGVkOg0KPj4+Pj4+Pj4g
DQo+Pj4+Pj4+PiBPbiAxOCBKdWwgMjAxNiwgYXQgMDozMiwgVHJvbmQgTXlrbGVidXN0IHdyb3Rl
Og0KPj4+Pj4+Pj4+IEnigJltIG5vdCB1bmRlcnN0YW5kaW5nIHRoaXMgYXJndW1lbnQuIFdoeSBk
byB3ZSBjYXJlIGlmIHRoZSBmaWxlIHNpemUgaXMgdXANCj4+Pj4+Pj4+PiB0byBkYXRlIG9uIHRo
ZSBzZXJ2ZXIgaWYgd2XigJlyZSBub3Qgc2VuZGluZyBhbiBhY3R1YWwgR0VUQVRUUiBvbiB0aGUg
d2lyZQ0KPj4+Pj4+Pj4+IHRvIHJldHJpZXZlIHRoZSBmaWxlIHNpemU/DQo+Pj4+Pj4+PiANCj4+
Pj4+Pj4+IEkgZ3Vlc3MgdGhlIGFuc3dlciBtaWdodCBiZSBiZWNhdXNlIHdlIGNhbiBnZXQgaXQg
YmFjayBmcm9tIHRoZSBsYXN0DQo+Pj4+Pj4+PiBMQVlPVVRDT01NSVQuDQo+Pj4+Pj4+PiANCj4+
Pj4+Pj4gDQo+Pj4+Pj4+IFRoZSBwYXRjaCB0aGF0IEkgZm9sbG93ZWQgdXAgd2l0aCBzaG91bGQg
bm93IGVuc3VyZSB0aGF0IHdlIGRvIG5vdCBtYXJrIHRoZSBhdHRyaWJ1dGUgY2FjaGUgYXMgdXAg
dG8gZGF0ZSBpZiB0aGVyZSBpcyBhIExBWU9VVENPTU1JVCBwZW5kaW5nLg0KPj4+Pj4+PiBJT1c6
IHdoZW4gdGhlIHBORlMgd3JpdGUgaXMgZG9uZSwgaXQgaXMgZXhwZWN0ZWQgdG8gZG8gMiB0aGlu
Z3M6DQo+Pj4+Pj4+IA0KPj4+Pj4+PiAxKSBtYXJrIHRoZSBpbm9kZSBmb3IgTEFZT1VUQ09NTUlU
DQo+Pj4+Pj4+IDIpIG1hcmsgdGhlIGF0dHJpYnV0ZSBjYWNoZSBhcyBpbnZhbGlkIChiZWNhdXNl
IHdlIGtub3cgdGhlIGNoYW5nZSBhdHRyaWJ1dGUsIG10aW1lLCBjdGltZSBuZWVkIHRvIGJlIHVw
ZGF0ZXMpDQo+Pj4+Pj4+IA0KPj4+Pj4+PiBJbiB0aGUgY2FzZSBvZiBibG9ja3MgcE5GUyB3cml0
ZToNCj4+Pj4+Pj4gVGhlIGNhbGwgdG8gcG5mc19zZXRfbGF5b3V0Y29tbWl0KCkgaW4gcG5mc19s
ZF93cml0ZV9kb25lKCkgc2hvdWxkIHRha2UgY2FyZSBvZiAoMSkNCj4+Pj4+Pj4gVGhlIGNhbGwg
dG8gbmZzX3dyaXRlYmFja191cGRhdGVfaW5vZGUoKSBpbiBuZnM0X3dyaXRlX2RvbmVfY2IoKSBz
aG91bGQgdGFrZSBjYXJlIG9mICgyKS4NCj4+Pj4+Pj4gDQo+Pj4+Pj4+IFByb3ZpZGVkIHRoYXQg
dGhlc2UgMiBjYWxscyBhcmUgcGVyZm9ybWVkIGluIHRoZSBhYm92ZSBvcmRlciwgdGhlbiBhbnkg
Y2FsbCB0byBuZnNfZ2V0YXR0cigpIHdoaWNoIGhhcyBub3QgYmVlbiBwcmVjZWRlZCBieSBhIGNh
bGwgdG8gbmZzNF9wcm9jX2xheW91dGNvbW1pdCgpIHNob3VsZCB0cmlnZ2VyIHRoZSBjYWxsIHRv
IF9fbmZzX3JldmFsaWRhdGVfaW5vZGUoKS4NCj4+Pj4+PiANCj4+Pj4+PiBJIHRoaW5rIHRoZSBw
cm9ibGVtIGlzIHRoYXQgYSBmb2xsb3dpbmcgbmZzX2dldGF0dHIoKSB3aWxsIGZhaWwgdG8gbm90
aWNlDQo+Pj4+Pj4gdGhlIHNpemUgY2hhbmdlIGluIHRoZSBjYXNlIG9mIGEgd3JpdGVfY29tcGxl
dGlvbiBhbmQgbGF5b3V0Y29tbWl0IG9jY3VyaW5nDQo+Pj4+Pj4gYWZ0ZXIgbmZzX2dldGF0dHIo
KSBoYXMgZG9uZSBwbmZzX3N5bmNfaW5vZGUoKSBidXQgYmVmb3JlIGl0IGhhcyBkb25lDQo+Pj4+
Pj4gbmZzX3VwZGF0ZV9pbm9kZSgpLg0KPj4+Pj4+IA0KPj4+Pj4+IEluIHRoZSBmYWlsaW5nIGNh
c2UgdGhlcmUgYXJlIHR3byB0aHJlYWRzIG9uZSBpcyBkb2luZyB3cml0ZXMsIHRoZSBvdGhlcg0K
Pj4+Pj4+IGRvaW5nIGxzdGF0IG9uIGFpb19jb21wbGV0ZSB2aWEgaW9fZ2V0ZXZlbnRzKDIpLg0K
Pj4+Pj4+IA0KPj4+Pj4+IEZvciBlYWNoIHdyaXRlIGNvbXBsZXRpb24gdGhlIGxzdGF0IHRocmVh
ZCB0cmllcyB0byB2ZXJpZnkgdGhlIGZpbGUgc2l6ZS4NCj4+Pj4+PiANCj4+Pj4+PiBHRVRBVFRS
IFRocmVhZCAgICAgICAgICAgICAgICAgIExBWU9VVENPTU1JVCBUaHJlYWQNCj4+Pj4+PiAtLS0t
LS0tLS0tLS0tLSAgICAgICAgICAgICAgICAgIC0tLS0tLS0tLS0tLS0tLS0tLS0tDQo+Pj4+Pj4g
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICB3cml0ZV9jb21wbGV0aW9uIHNldHMgTEFZT1VU
Q09NTUlUICg0MDk2QDApDQo+Pj4+Pj4gLS0+IG5mc19nZXRhdHRyDQo+Pj4+PiANCj4+Pj4+IGZp
bGVtYXBfd3JpdGVfYW5kX3dhaXQoKQ0KPj4+Pj4gDQo+Pj4+Pj4gX19uZnNfcmV2YWxpZGF0ZV9p
bm9kZQ0KPj4+Pj4+IHBuZnNfc3luY19pbm9kZQ0KPj4+Pj4gDQo+Pj4+PiBORlNfUFJPVE8oaW5v
ZGUpLT5nZXRhdHRyKCkNCj4+Pj4+IA0KPj4+Pj4+IGdldGF0dHIgc2VlcyA0MDk2DQo+Pj4+Pj4g
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICB3cml0ZV9jb21wbGV0aW9uIHNldHMgTEFZT1VU
Q09NTUlUICg0MDk2QDQwOTYpDQo+Pj4+Pj4gICAgICAgICAgICAgICAgICAgICAgICAgICAgICBz
ZXRzIExBWU9VVENPTU1JVElORw0KPj4+Pj4+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
Y2xlYXJzIExBWU9VVENPTU1JVA0KPj4+Pj4+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
Y2xlYXJzIExBWU9VVENPTU1JVFRJTkcNCj4+Pj4+PiBuZnNfcmVmcmVzaF9pbm9kZQ0KPj4+Pj4+
IG5mc191cGRhdGVfaW5vZGUgc2l6ZSBpcyA0MDk2DQo+Pj4+Pj4gPC0tIG5mc19nZXRhdHRyDQo+
Pj4+Pj4gDQo+Pj4+Pj4gQXQgdGhpcyBwb2ludCB0aGUgY2FjaGVkIGF0dHJpYnV0ZXMgYXJlIHNl
ZW4gYXMgdXAgdG8gZGF0ZSwgYnV0DQo+Pj4+Pj4gYWlvLWRpby1leHRlbmQtc3RhdCBwcm9ncmFt
IGV4cGVjdHMgdGhhdCBzZWNvbmQgd3JpdGVfY29tcGxldGlvbiB0byByZWZsZWN0DQo+Pj4+Pj4g
aW4gdGhlIGZpbGUgc2l6ZS4NCj4+Pj4+PiANCj4+Pj4+IA0KPj4+Pj4gV2h5IGlzbuKAmXQgdGhl
IGZpbGVtYXBfd3JpdGVfYW5kX3dhaXQoKSBhYm92ZSByZXNvbHZpbmcgdGhlIHJhY2U/IEnigJlk
DQo+Pj4+PiBleHBlY3QgdGhhdCB3b3VsZCBtb3ZlIHlvdXIg4oCcd3JpdGUgY29tcGxldGlvbiBz
ZXRzIExBWU9VVENPTU1JVOKAnSB1cCB0bw0KPj4+Pj4gYmVmb3JlIHRoZSBwbmZzX3N5bmNfaW5v
ZGUoKS4gIEluIGZhY3QsIGluIHRoZSBwYXRjaCB0aGF0IENocmlzdG9waCBzZW50LA0KPj4+Pj4g
YWxsIGhlIHdhcyBkb2luZyB3YXMgbW92aW5nIHRoZSBwbmZzX3N5bmNfaW5vZGUoKSB0byBpbW1l
ZGlhdGVseSBhZnRlcg0KPj4+Pj4gdGhhdCBmaWxlbWFwX3dyaXRlX2FuZF93YWl0KCkgaW5zdGVh
ZCBvZiByZWx5aW5nIG9uIGl0IGluDQo+Pj4+PiBfbmZzX3JldmFsaWRhdGVfaW5vZGUuDQo+Pj4+
IA0KPj4+PiBUaGlzIGlzIE9fRElSRUNULCBJJ3ZlIGZhaWxlZCB0byBtZW50aW9uIHlldC4gIFRo
ZSBzZWNvbmQgd3JpdGUgaGFzbid0IG1hZGUNCj4+Pj4gaXQgb3V0IG9mIF9fbmZzX3BhZ2Vpb19h
ZGRfcmVxdWVzdCgpIGF0IHRoZSB0aW1lIGZpbGVtYXBfd3JpdGVfYW5kX3dhaXQoKSBpcw0KPj4+
PiBjYWxsZWQuICBJdCBpcyBzbGVlcGluZyBpbiBwbmZzX3VwZGF0ZV9sYXlvdXQoKSB3YWl0aW5n
IG9uIGEgTEFZT1VUR0VUIGFuZCBpdA0KPj4+PiBkb2Vzbid0IHJlc3VtZXMgdW50aWwgYWZ0ZXIg
ZmlsZW1hcF93cml0ZV9hbmRfd2FpdCgpLg0KPj4+IA0KPj4+IFdhaXQsIHNvIHlvdSBoYXZlIDEg
dGhyZWFkIGRvaW5nIGFuIE9fRElSRUNUIHdyaXRlKCkgYW5kIGFub3RoZXIgZG9pbmcgYQ0KPj4+
IHN0YXQoKSBpbiBwYXJhbGxlbD8gV2h5IHdvdWxkIHRoZXJlIGJlIGFuIGV4cGVjdGF0aW9uIHRo
YXQgdGhlIGZpbGVzeXN0ZW0NCj4+PiBzaG91bGQgc2VyaWFsaXNlIHRob3NlIHN5c3RlbSBjYWxs
cz8NCj4+IA0KPj4gTm90IGV4YWN0bHkgcGFyYWxsZWwsIGJ1dCBzeW5jaHJvbml6ZWQgb24gYWlv
X2NvbXBsZXRlLiAgQSBjb21tZW50IGluDQo+PiBnZW5lcmljLzIwNydzIHNyYy9haW8tZGlvLXJl
Z3Jlc3MvYWlvLWRpby1leHRlbmQtc3RhdC5jOg0KPj4gDQo+PiAzNiAvKg0KPj4gMzcgICogVGhp
cyB3YXMgb3JpZ2luYWxseSBzdWJtaXR0ZWQgdG8NCj4+IDM4ICAqIGh0dHA6Ly9idWd6aWxsYS5r
ZXJuZWwub3JnL3Nob3dfYnVnLmNnaT9pZD02ODMxIGJ5DQo+PiAzOSAgKiBSYWZhbCBXaWphdGEg
PHdpamF0YUBuZWMtbGFicy5jb20+LiAgSXQgY2F1Z2h0IGEgcmFjZSBpbiBkaW8gYWlvIGNvbXBs
ZXRpb24NCj4+IDQwICAqIHRoYXQgd291bGQgY2FsbCBhaW9fY29tcGxldGUoKSBiZWZvcmUgdGhl
IGRpbyBjYWxsZXJzIHdvdWxkIHVwZGF0ZSBpX3NpemUuDQo+PiA0MSAgKiBBIHN0YXQgYWZ0ZXIg
aW9fZ2V0ZXZlbnRzKCkgd291bGQgbm90IHNlZSB0aGUgbmV3IGZpbGUgc2l6ZS4NCj4+IDQyICAq
DQo+PiA0MyAgKiBUaGUgYnVnIHdhcyBmaXhlZCBpbiB0aGUgZnMvZGlyZWN0LWlvLmMgY29tcGxl
dGlvbiByZXdvcmtpbmcgdGhhdCBhcHBlYXJlZA0KPj4gNDQgICogaW4gMi42LjIwLiAgVGhpcyB0
ZXN0IHNob3VsZCBmYWlsIG9uIDIuNi4xOS4NCj4+IDQ1ICAqLw0KPj4gDQo+PiBBcyBmYXIgYXMg
SSBjYW4gc2VlLCB0aGlzIGNoZWNrIGlzIHRoZSB3aG9sZSBwb2ludCBvZiBnZW5lcmljLzIwNy4u
DQo+IA0KPiBUaGlzIHdvdWxkIGZpeCBpdCB1cDoNCj4gDQo+IGRpZmYgLS1naXQgYS9mcy9uZnMv
aW5vZGUuYyBiL2ZzL25mcy9pbm9kZS5jDQo+IGluZGV4IGYxMDhkNTgxMDFmOC4uODIzNzAwZjgy
N2I2IDEwMDY0NA0KPiAtLS0gYS9mcy9uZnMvaW5vZGUuYw0KPiArKysgYi9mcy9uZnMvaW5vZGUu
Yw0KPiBAQCAtNjYxLDYgKzY2MSw3IEBAIGludCBuZnNfZ2V0YXR0cihzdHJ1Y3QgdmZzbW91bnQg
Km1udCwgc3RydWN0IGRlbnRyeQ0KPiAqZGVudHJ5LCBzdHJ1Y3Qga3N0YXQgKnN0YXQpDQo+IA0K
PiAgICAgICAgdHJhY2VfbmZzX2dldGF0dHJfZW50ZXIoaW5vZGUpOw0KPiAgICAgICAgLyogRmx1
c2ggb3V0IHdyaXRlcyB0byB0aGUgc2VydmVyIGluIG9yZGVyIHRvIHVwZGF0ZSBjL210aW1lLiAg
Ki8NCj4gKyAgICAgICBuZnNfc3RhcnRfaW9fcmVhZChpbm9kZSk7DQo+ICAgICAgICBpZiAoU19J
U1JFRyhpbm9kZS0+aV9tb2RlKSkgew0KPiAgICAgICAgICAgICAgICBlcnIgPSBmaWxlbWFwX3dy
aXRlX2FuZF93YWl0KGlub2RlLT5pX21hcHBpbmcpOw0KPiAgICAgICAgICAgICAgICBpZiAoZXJy
KQ0KPiBAQCAtNjk0LDYgKzY5NSw3IEBAIGludCBuZnNfZ2V0YXR0cihzdHJ1Y3QgdmZzbW91bnQg
Km1udCwgc3RydWN0IGRlbnRyeQ0KPiAqZGVudHJ5LCBzdHJ1Y3Qga3N0YXQgKnN0YXQpDQo+ICAg
ICAgICAgICAgICAgICAgICAgICAgc3RhdC0+Ymxrc2l6ZSA9IE5GU19TRVJWRVIoaW5vZGUpLT5k
dHNpemU7DQo+ICAgICAgICB9DQo+IG91dDoNCj4gKyAgICAgICBuZnNfZW5kX2lvX3JlYWQoaW5v
ZGUpOw0KPiAgICAgICAgdHJhY2VfbmZzX2dldGF0dHJfZXhpdChpbm9kZSwgZXJyKTsNCj4gICAg
ICAgIHJldHVybiBlcnI7DQo+IH0NCj4gDQo+IFRyb25kLCB3aGF0IGRvIHlvdSB0aGluaz8gIEkn
bGwgdGFrZSBhbnkgYWRkaXRpb25hbCBzaWxlbmNlIGFzIGEgc2lnbiB0byBnbw0KPiBlbHNld2hl
cmUuICA6UA0KDQpOby4gVGhlIGFib3ZlIGxvY2tpbmcgZXhjbHVkZXMgYWxsIHdyaXRlcyBhcyB3
ZWxsIGFzIE9fRElSRUNUIHJlYWRz4oCmIFRoYXTigJlzIHdvcnNlIHRoYW4gd2UgaGFkIGJlZm9y
ZS4NCg0KSeKAmWQgbGlrZSByYXRoZXIgdG8gdW5kZXJzdGFuZCBfd2h5XyB0aGUgYWlvX2NvbXBs
ZXRlKCkgaXMgZmFpbGluZyB0byB3b3JrIGNvcnJlY3RseSBoZXJlLiBBY2NvcmRpbmcgdG8gdGhl
IGFuYWx5c2lzIG9mIHRoZSB0ZXN0IGNhc2UgdGhhdCB5b3UgcXVvdGVkIHllc3RlcmRheSwgdGhl
IE9fRElSRUNUIHdyaXRlcyBzaG91bGQgaGF2ZSBjb21wbGV0ZWQgYmVmb3JlIHdlIGV2ZW4gY2Fs
bCBzdGF0KCkuDQoNCkNoZWVycw0KICBUcm9uZA0KDQo=

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-26 16:35                                                                                         ` Trond Myklebust
@ 2016-07-26 17:57                                                                                           ` Benjamin Coddington
  2016-07-26 18:07                                                                                             ` Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Benjamin Coddington @ 2016-07-26 17:57 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: hch, List Linux NFS Mailing

On 26 Jul 2016, at 12:35, Trond Myklebust wrote:

>> On Jul 26, 2016, at 12:32, Benjamin Coddington <bcodding@redhat.com> 
>> wrote:
>>
>> On 25 Jul 2016, at 14:41, Benjamin Coddington wrote:
>>
>>> On 25 Jul 2016, at 14:34, Trond Myklebust wrote:
>>>
>>>>> On Jul 25, 2016, at 14:26, Benjamin Coddington 
>>>>> <bcodding@redhat.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 25 Jul 2016, at 12:39, Trond Myklebust wrote:
>>>>>
>>>>>>> On Jul 25, 2016, at 12:26, Benjamin Coddington 
>>>>>>> <bcodding@redhat.com> wrote:
>>>>>>>
>>>>>>> On 21 Jul 2016, at 9:20, Trond Myklebust wrote:
>>>>>>>
>>>>>>>>> On Jul 21, 2016, at 09:05, Benjamin Coddington 
>>>>>>>>> <bcodding@redhat.com> wrote:
>>>>>>>>>
>>>>>>>>> So back to Christoph's point earlier:
>>>>>>>>>
>>>>>>>>> On 17 Jul 2016, at 23:48, Christoph Hellwig wrote:
>>>>>>>>>> This one breaks xfstests generic/207 on block/scsi layout for 
>>>>>>>>>> me.  The
>>>>>>>>>> reason for that is that we need a layoutcommit after writing 
>>>>>>>>>> out all
>>>>>>>>>> data for the file for the file size to be updated on the 
>>>>>>>>>> server.
>>>>>>>>>
>>>>>>>>> You responded:
>>>>>>>>>
>>>>>>>>> On 18 Jul 2016, at 0:32, Trond Myklebust wrote:
>>>>>>>>>> I’m not understanding this argument. Why do we care if the 
>>>>>>>>>> file size is up
>>>>>>>>>> to date on the server if we’re not sending an actual 
>>>>>>>>>> GETATTR on the wire
>>>>>>>>>> to retrieve the file size?
>>>>>>>>>
>>>>>>>>> I guess the answer might be because we can get it back from 
>>>>>>>>> the last
>>>>>>>>> LAYOUTCOMMIT.
>>>>>>>>>
>>>>>>>>
>>>>>>>> The patch that I followed up with should now ensure that we do 
>>>>>>>> not mark the attribute cache as up to date if there is a 
>>>>>>>> LAYOUTCOMMIT pending.
>>>>>>>> IOW: when the pNFS write is done, it is expected to do 2 
>>>>>>>> things:
>>>>>>>>
>>>>>>>> 1) mark the inode for LAYOUTCOMMIT
>>>>>>>> 2) mark the attribute cache as invalid (because we know the 
>>>>>>>> change attribute, mtime, ctime need to be updates)
>>>>>>>>
>>>>>>>> In the case of blocks pNFS write:
>>>>>>>> The call to pnfs_set_layoutcommit() in pnfs_ld_write_done() 
>>>>>>>> should take care of (1)
>>>>>>>> The call to nfs_writeback_update_inode() in 
>>>>>>>> nfs4_write_done_cb() should take care of (2).
>>>>>>>>
>>>>>>>> Provided that these 2 calls are performed in the above order, 
>>>>>>>> then any call to nfs_getattr() which has not been preceded by a 
>>>>>>>> call to nfs4_proc_layoutcommit() should trigger the call to 
>>>>>>>> __nfs_revalidate_inode().
>>>>>>>
>>>>>>> I think the problem is that a following nfs_getattr() will fail 
>>>>>>> to notice
>>>>>>> the size change in the case of a write_completion and 
>>>>>>> layoutcommit occuring
>>>>>>> after nfs_getattr() has done pnfs_sync_inode() but before it has 
>>>>>>> done
>>>>>>> nfs_update_inode().
>>>>>>>
>>>>>>> In the failing case there are two threads one is doing writes, 
>>>>>>> the other
>>>>>>> doing lstat on aio_complete via io_getevents(2).
>>>>>>>
>>>>>>> For each write completion the lstat thread tries to verify the 
>>>>>>> file size.
>>>>>>>
>>>>>>> GETATTR Thread                  LAYOUTCOMMIT Thread
>>>>>>> --------------                  --------------------
>>>>>>>                              write_completion sets LAYOUTCOMMIT 
>>>>>>> (4096@0)
>>>>>>> --> nfs_getattr
>>>>>>
>>>>>> filemap_write_and_wait()
>>>>>>
>>>>>>> __nfs_revalidate_inode
>>>>>>> pnfs_sync_inode
>>>>>>
>>>>>> NFS_PROTO(inode)->getattr()
>>>>>>
>>>>>>> getattr sees 4096
>>>>>>>                              write_completion sets LAYOUTCOMMIT 
>>>>>>> (4096@4096)
>>>>>>>                              sets LAYOUTCOMMITING
>>>>>>>                              clears LAYOUTCOMMIT
>>>>>>>                              clears LAYOUTCOMMITTING
>>>>>>> nfs_refresh_inode
>>>>>>> nfs_update_inode size is 4096
>>>>>>> <-- nfs_getattr
>>>>>>>
>>>>>>> At this point the cached attributes are seen as up to date, but
>>>>>>> aio-dio-extend-stat program expects that second write_completion 
>>>>>>> to reflect
>>>>>>> in the file size.
>>>>>>>
>>>>>>
>>>>>> Why isn’t the filemap_write_and_wait() above resolving the 
>>>>>> race? I’d
>>>>>> expect that would move your “write completion sets 
>>>>>> LAYOUTCOMMIT” up to
>>>>>> before the pnfs_sync_inode().  In fact, in the patch that 
>>>>>> Christoph sent,
>>>>>> all he was doing was moving the pnfs_sync_inode() to immediately 
>>>>>> after
>>>>>> that filemap_write_and_wait() instead of relying on it in
>>>>>> _nfs_revalidate_inode.
>>>>>
>>>>> This is O_DIRECT, I've failed to mention yet.  The second write 
>>>>> hasn't made
>>>>> it out of __nfs_pageio_add_request() at the time 
>>>>> filemap_write_and_wait() is
>>>>> called.  It is sleeping in pnfs_update_layout() waiting on a 
>>>>> LAYOUTGET and it
>>>>> doesn't resumes until after filemap_write_and_wait().
>>>>
>>>> Wait, so you have 1 thread doing an O_DIRECT write() and another 
>>>> doing a
>>>> stat() in parallel? Why would there be an expectation that the 
>>>> filesystem
>>>> should serialise those system calls?
>>>
>>> Not exactly parallel, but synchronized on aio_complete.  A comment 
>>> in
>>> generic/207's src/aio-dio-regress/aio-dio-extend-stat.c:
>>>
>>> 36 /*
>>> 37  * This was originally submitted to
>>> 38  * http://bugzilla.kernel.org/show_bug.cgi?id=6831 by
>>> 39  * Rafal Wijata <wijata@nec-labs.com>.  It caught a race in dio 
>>> aio completion
>>> 40  * that would call aio_complete() before the dio callers would 
>>> update i_size.
>>> 41  * A stat after io_getevents() would not see the new file size.
>>> 42  *
>>> 43  * The bug was fixed in the fs/direct-io.c completion reworking 
>>> that appeared
>>> 44  * in 2.6.20.  This test should fail on 2.6.19.
>>> 45  */
>>>
>>> As far as I can see, this check is the whole point of generic/207..
>>
>> This would fix it up:
>>
>> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
>> index f108d58101f8..823700f827b6 100644
>> --- a/fs/nfs/inode.c
>> +++ b/fs/nfs/inode.c
>> @@ -661,6 +661,7 @@ int nfs_getattr(struct vfsmount *mnt, struct 
>> dentry
>> *dentry, struct kstat *stat)
>>
>>        trace_nfs_getattr_enter(inode);
>>        /* Flush out writes to the server in order to update c/mtime.  
>> */
>> +       nfs_start_io_read(inode);
>>        if (S_ISREG(inode->i_mode)) {
>>                err = filemap_write_and_wait(inode->i_mapping);
>>                if (err)
>> @@ -694,6 +695,7 @@ int nfs_getattr(struct vfsmount *mnt, struct 
>> dentry
>> *dentry, struct kstat *stat)
>>                        stat->blksize = NFS_SERVER(inode)->dtsize;
>>        }
>> out:
>> +       nfs_end_io_read(inode);
>>        trace_nfs_getattr_exit(inode, err);
>>        return err;
>> }
>>
>> Trond, what do you think?  I'll take any additional silence as a sign 
>> to go
>> elsewhere.  :P
>
> No. The above locking excludes all writes as well as O_DIRECT reads… 
> That’s worse than we had before.
>
> I’d like rather to understand _why_ the aio_complete() is failing to 
> work correctly here. According to the analysis of the test case that 
> you quoted yesterday, the O_DIRECT writes should have completed before 
> we even call stat().
>
> Cheers
>   Trond

The O_DIRECT writes do complete, and every completion signals the other
thread to do stat(), but that completion does not update the size on the
server.  As we know, we need a LAYOUTCOMMIT.  After this patch, we're 
only
going to do a LAYOUTCOMMIT if nfs_need_revalidate_inode(inode).

So what happens is that the first write completes, and the first
nfs_getattr() triggers the first LAYOUTCOMMIT, and then a GETATTR.
Simultaneously, the second write is waiting on a LAYOUTGET.  The GETATTR
completes and sets the size to 4k.

Now the attributes are marked up to date with a size of 4k, and when the
second write completes, nfs_getattr() is called, 
nfs_need_revalidate_inode()
is false, and we don't bother to send another LAYOUTCOMMIT or GETATTR to
correct the cached file size.

Here's a function graph of that: 
http://people.redhat.com/bcodding/borken

We could invalidate the attribute cache every time a write completes.. 
maybe
nfs_writeback_update_inode() isn't doing the job for block layouts (are 
we
not setting res.count?  I'll look at that..)

I think we could also use the LAYOUTCOMMIT results to invalidate the 
cache,
RFC-5661 18.42.3:
    If the metadata server updates the file's size as the
    result of the LAYOUTCOMMIT operation, it must return the new size
    (locr_newsize.ns_size) as part of the results."

I'm not sure you want to bother to try to reproduce -- but if so, you 
don't
need special hardware for SCSI layout.  I wrote up a quick how-to for 
SCSI
layouts on a VM in qemu:
http://people.redhat.com/bcodding/pnfs/nfs/scsi/2016/07/13/pnfs_scsi_setup_for_VMs/

Ben

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-26 17:57                                                                                           ` Benjamin Coddington
@ 2016-07-26 18:07                                                                                             ` Trond Myklebust
  2016-07-27 11:55                                                                                               ` Benjamin Coddington
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-26 18:07 UTC (permalink / raw)
  To: Coddington Benjamin; +Cc: hch, List Linux NFS Mailing

DQo+IE9uIEp1bCAyNiwgMjAxNiwgYXQgMTM6NTcsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2Rk
aW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPiANCj4gT24gMjYgSnVsIDIwMTYsIGF0IDEyOjM1LCBU
cm9uZCBNeWtsZWJ1c3Qgd3JvdGU6DQo+IA0KPj4+IE9uIEp1bCAyNiwgMjAxNiwgYXQgMTI6MzIs
IEJlbmphbWluIENvZGRpbmd0b24gPGJjb2RkaW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPj4+IA0K
Pj4+IE9uIDI1IEp1bCAyMDE2LCBhdCAxNDo0MSwgQmVuamFtaW4gQ29kZGluZ3RvbiB3cm90ZToN
Cj4+PiANCj4+Pj4gT24gMjUgSnVsIDIwMTYsIGF0IDE0OjM0LCBUcm9uZCBNeWtsZWJ1c3Qgd3Jv
dGU6DQo+Pj4+IA0KPj4+Pj4+IE9uIEp1bCAyNSwgMjAxNiwgYXQgMTQ6MjYsIEJlbmphbWluIENv
ZGRpbmd0b24gPGJjb2RkaW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPj4+Pj4+IA0KPj4+Pj4+IA0K
Pj4+Pj4+IA0KPj4+Pj4+IE9uIDI1IEp1bCAyMDE2LCBhdCAxMjozOSwgVHJvbmQgTXlrbGVidXN0
IHdyb3RlOg0KPj4+Pj4+IA0KPj4+Pj4+Pj4gT24gSnVsIDI1LCAyMDE2LCBhdCAxMjoyNiwgQmVu
amFtaW4gQ29kZGluZ3RvbiA8YmNvZGRpbmdAcmVkaGF0LmNvbT4gd3JvdGU6DQo+Pj4+Pj4+PiAN
Cj4+Pj4+Pj4+IE9uIDIxIEp1bCAyMDE2LCBhdCA5OjIwLCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6
DQo+Pj4+Pj4+PiANCj4+Pj4+Pj4+Pj4gT24gSnVsIDIxLCAyMDE2LCBhdCAwOTowNSwgQmVuamFt
aW4gQ29kZGluZ3RvbiA8YmNvZGRpbmdAcmVkaGF0LmNvbT4gd3JvdGU6DQo+Pj4+Pj4+Pj4+IA0K
Pj4+Pj4+Pj4+PiBTbyBiYWNrIHRvIENocmlzdG9waCdzIHBvaW50IGVhcmxpZXI6DQo+Pj4+Pj4+
Pj4+IA0KPj4+Pj4+Pj4+PiBPbiAxNyBKdWwgMjAxNiwgYXQgMjM6NDgsIENocmlzdG9waCBIZWxs
d2lnIHdyb3RlOg0KPj4+Pj4+Pj4+Pj4gVGhpcyBvbmUgYnJlYWtzIHhmc3Rlc3RzIGdlbmVyaWMv
MjA3IG9uIGJsb2NrL3Njc2kgbGF5b3V0IGZvciBtZS4gIFRoZQ0KPj4+Pj4+Pj4+Pj4gcmVhc29u
IGZvciB0aGF0IGlzIHRoYXQgd2UgbmVlZCBhIGxheW91dGNvbW1pdCBhZnRlciB3cml0aW5nIG91
dCBhbGwNCj4+Pj4+Pj4+Pj4+IGRhdGEgZm9yIHRoZSBmaWxlIGZvciB0aGUgZmlsZSBzaXplIHRv
IGJlIHVwZGF0ZWQgb24gdGhlIHNlcnZlci4NCj4+Pj4+Pj4+Pj4gDQo+Pj4+Pj4+Pj4+IFlvdSBy
ZXNwb25kZWQ6DQo+Pj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4+PiBPbiAxOCBKdWwgMjAxNiwgYXQgMDoz
MiwgVHJvbmQgTXlrbGVidXN0IHdyb3RlOg0KPj4+Pj4+Pj4+Pj4gSeKAmW0gbm90IHVuZGVyc3Rh
bmRpbmcgdGhpcyBhcmd1bWVudC4gV2h5IGRvIHdlIGNhcmUgaWYgdGhlIGZpbGUgc2l6ZSBpcyB1
cA0KPj4+Pj4+Pj4+Pj4gdG8gZGF0ZSBvbiB0aGUgc2VydmVyIGlmIHdl4oCZcmUgbm90IHNlbmRp
bmcgYW4gYWN0dWFsIEdFVEFUVFIgb24gdGhlIHdpcmUNCj4+Pj4+Pj4+Pj4+IHRvIHJldHJpZXZl
IHRoZSBmaWxlIHNpemU/DQo+Pj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4+PiBJIGd1ZXNzIHRoZSBhbnN3
ZXIgbWlnaHQgYmUgYmVjYXVzZSB3ZSBjYW4gZ2V0IGl0IGJhY2sgZnJvbSB0aGUgbGFzdA0KPj4+
Pj4+Pj4+PiBMQVlPVVRDT01NSVQuDQo+Pj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4+
IFRoZSBwYXRjaCB0aGF0IEkgZm9sbG93ZWQgdXAgd2l0aCBzaG91bGQgbm93IGVuc3VyZSB0aGF0
IHdlIGRvIG5vdCBtYXJrIHRoZSBhdHRyaWJ1dGUgY2FjaGUgYXMgdXAgdG8gZGF0ZSBpZiB0aGVy
ZSBpcyBhIExBWU9VVENPTU1JVCBwZW5kaW5nLg0KPj4+Pj4+Pj4+IElPVzogd2hlbiB0aGUgcE5G
UyB3cml0ZSBpcyBkb25lLCBpdCBpcyBleHBlY3RlZCB0byBkbyAyIHRoaW5nczoNCj4+Pj4+Pj4+
PiANCj4+Pj4+Pj4+PiAxKSBtYXJrIHRoZSBpbm9kZSBmb3IgTEFZT1VUQ09NTUlUDQo+Pj4+Pj4+
Pj4gMikgbWFyayB0aGUgYXR0cmlidXRlIGNhY2hlIGFzIGludmFsaWQgKGJlY2F1c2Ugd2Uga25v
dyB0aGUgY2hhbmdlIGF0dHJpYnV0ZSwgbXRpbWUsIGN0aW1lIG5lZWQgdG8gYmUgdXBkYXRlcykN
Cj4+Pj4+Pj4+PiANCj4+Pj4+Pj4+PiBJbiB0aGUgY2FzZSBvZiBibG9ja3MgcE5GUyB3cml0ZToN
Cj4+Pj4+Pj4+PiBUaGUgY2FsbCB0byBwbmZzX3NldF9sYXlvdXRjb21taXQoKSBpbiBwbmZzX2xk
X3dyaXRlX2RvbmUoKSBzaG91bGQgdGFrZSBjYXJlIG9mICgxKQ0KPj4+Pj4+Pj4+IFRoZSBjYWxs
IHRvIG5mc193cml0ZWJhY2tfdXBkYXRlX2lub2RlKCkgaW4gbmZzNF93cml0ZV9kb25lX2NiKCkg
c2hvdWxkIHRha2UgY2FyZSBvZiAoMikuDQo+Pj4+Pj4+Pj4gDQo+Pj4+Pj4+Pj4gUHJvdmlkZWQg
dGhhdCB0aGVzZSAyIGNhbGxzIGFyZSBwZXJmb3JtZWQgaW4gdGhlIGFib3ZlIG9yZGVyLCB0aGVu
IGFueSBjYWxsIHRvIG5mc19nZXRhdHRyKCkgd2hpY2ggaGFzIG5vdCBiZWVuIHByZWNlZGVkIGJ5
IGEgY2FsbCB0byBuZnM0X3Byb2NfbGF5b3V0Y29tbWl0KCkgc2hvdWxkIHRyaWdnZXIgdGhlIGNh
bGwgdG8gX19uZnNfcmV2YWxpZGF0ZV9pbm9kZSgpLg0KPj4+Pj4+Pj4gDQo+Pj4+Pj4+PiBJIHRo
aW5rIHRoZSBwcm9ibGVtIGlzIHRoYXQgYSBmb2xsb3dpbmcgbmZzX2dldGF0dHIoKSB3aWxsIGZh
aWwgdG8gbm90aWNlDQo+Pj4+Pj4+PiB0aGUgc2l6ZSBjaGFuZ2UgaW4gdGhlIGNhc2Ugb2YgYSB3
cml0ZV9jb21wbGV0aW9uIGFuZCBsYXlvdXRjb21taXQgb2NjdXJpbmcNCj4+Pj4+Pj4+IGFmdGVy
IG5mc19nZXRhdHRyKCkgaGFzIGRvbmUgcG5mc19zeW5jX2lub2RlKCkgYnV0IGJlZm9yZSBpdCBo
YXMgZG9uZQ0KPj4+Pj4+Pj4gbmZzX3VwZGF0ZV9pbm9kZSgpLg0KPj4+Pj4+Pj4gDQo+Pj4+Pj4+
PiBJbiB0aGUgZmFpbGluZyBjYXNlIHRoZXJlIGFyZSB0d28gdGhyZWFkcyBvbmUgaXMgZG9pbmcg
d3JpdGVzLCB0aGUgb3RoZXINCj4+Pj4+Pj4+IGRvaW5nIGxzdGF0IG9uIGFpb19jb21wbGV0ZSB2
aWEgaW9fZ2V0ZXZlbnRzKDIpLg0KPj4+Pj4+Pj4gDQo+Pj4+Pj4+PiBGb3IgZWFjaCB3cml0ZSBj
b21wbGV0aW9uIHRoZSBsc3RhdCB0aHJlYWQgdHJpZXMgdG8gdmVyaWZ5IHRoZSBmaWxlIHNpemUu
DQo+Pj4+Pj4+PiANCj4+Pj4+Pj4+IEdFVEFUVFIgVGhyZWFkICAgICAgICAgICAgICAgICAgTEFZ
T1VUQ09NTUlUIFRocmVhZA0KPj4+Pj4+Pj4gLS0tLS0tLS0tLS0tLS0gICAgICAgICAgICAgICAg
ICAtLS0tLS0tLS0tLS0tLS0tLS0tLQ0KPj4+Pj4+Pj4gICAgICAgICAgICAgICAgICAgICAgICAg
ICAgIHdyaXRlX2NvbXBsZXRpb24gc2V0cyBMQVlPVVRDT01NSVQgKDQwOTZAMCkNCj4+Pj4+Pj4+
IC0tPiBuZnNfZ2V0YXR0cg0KPj4+Pj4+PiANCj4+Pj4+Pj4gZmlsZW1hcF93cml0ZV9hbmRfd2Fp
dCgpDQo+Pj4+Pj4+IA0KPj4+Pj4+Pj4gX19uZnNfcmV2YWxpZGF0ZV9pbm9kZQ0KPj4+Pj4+Pj4g
cG5mc19zeW5jX2lub2RlDQo+Pj4+Pj4+IA0KPj4+Pj4+PiBORlNfUFJPVE8oaW5vZGUpLT5nZXRh
dHRyKCkNCj4+Pj4+Pj4gDQo+Pj4+Pj4+PiBnZXRhdHRyIHNlZXMgNDA5Ng0KPj4+Pj4+Pj4gICAg
ICAgICAgICAgICAgICAgICAgICAgICAgIHdyaXRlX2NvbXBsZXRpb24gc2V0cyBMQVlPVVRDT01N
SVQgKDQwOTZANDA5NikNCj4+Pj4+Pj4+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICBzZXRz
IExBWU9VVENPTU1JVElORw0KPj4+Pj4+Pj4gICAgICAgICAgICAgICAgICAgICAgICAgICAgIGNs
ZWFycyBMQVlPVVRDT01NSVQNCj4+Pj4+Pj4+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICBj
bGVhcnMgTEFZT1VUQ09NTUlUVElORw0KPj4+Pj4+Pj4gbmZzX3JlZnJlc2hfaW5vZGUNCj4+Pj4+
Pj4+IG5mc191cGRhdGVfaW5vZGUgc2l6ZSBpcyA0MDk2DQo+Pj4+Pj4+PiA8LS0gbmZzX2dldGF0
dHINCj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4gQXQgdGhpcyBwb2ludCB0aGUgY2FjaGVkIGF0dHJpYnV0
ZXMgYXJlIHNlZW4gYXMgdXAgdG8gZGF0ZSwgYnV0DQo+Pj4+Pj4+PiBhaW8tZGlvLWV4dGVuZC1z
dGF0IHByb2dyYW0gZXhwZWN0cyB0aGF0IHNlY29uZCB3cml0ZV9jb21wbGV0aW9uIHRvIHJlZmxl
Y3QNCj4+Pj4+Pj4+IGluIHRoZSBmaWxlIHNpemUuDQo+Pj4+Pj4+PiANCj4+Pj4+Pj4gDQo+Pj4+
Pj4+IFdoeSBpc27igJl0IHRoZSBmaWxlbWFwX3dyaXRlX2FuZF93YWl0KCkgYWJvdmUgcmVzb2x2
aW5nIHRoZSByYWNlPyBJ4oCZZA0KPj4+Pj4+PiBleHBlY3QgdGhhdCB3b3VsZCBtb3ZlIHlvdXIg
4oCcd3JpdGUgY29tcGxldGlvbiBzZXRzIExBWU9VVENPTU1JVOKAnSB1cCB0bw0KPj4+Pj4+PiBi
ZWZvcmUgdGhlIHBuZnNfc3luY19pbm9kZSgpLiAgSW4gZmFjdCwgaW4gdGhlIHBhdGNoIHRoYXQg
Q2hyaXN0b3BoIHNlbnQsDQo+Pj4+Pj4+IGFsbCBoZSB3YXMgZG9pbmcgd2FzIG1vdmluZyB0aGUg
cG5mc19zeW5jX2lub2RlKCkgdG8gaW1tZWRpYXRlbHkgYWZ0ZXINCj4+Pj4+Pj4gdGhhdCBmaWxl
bWFwX3dyaXRlX2FuZF93YWl0KCkgaW5zdGVhZCBvZiByZWx5aW5nIG9uIGl0IGluDQo+Pj4+Pj4+
IF9uZnNfcmV2YWxpZGF0ZV9pbm9kZS4NCj4+Pj4+PiANCj4+Pj4+PiBUaGlzIGlzIE9fRElSRUNU
LCBJJ3ZlIGZhaWxlZCB0byBtZW50aW9uIHlldC4gIFRoZSBzZWNvbmQgd3JpdGUgaGFzbid0IG1h
ZGUNCj4+Pj4+PiBpdCBvdXQgb2YgX19uZnNfcGFnZWlvX2FkZF9yZXF1ZXN0KCkgYXQgdGhlIHRp
bWUgZmlsZW1hcF93cml0ZV9hbmRfd2FpdCgpIGlzDQo+Pj4+Pj4gY2FsbGVkLiAgSXQgaXMgc2xl
ZXBpbmcgaW4gcG5mc191cGRhdGVfbGF5b3V0KCkgd2FpdGluZyBvbiBhIExBWU9VVEdFVCBhbmQg
aXQNCj4+Pj4+PiBkb2Vzbid0IHJlc3VtZXMgdW50aWwgYWZ0ZXIgZmlsZW1hcF93cml0ZV9hbmRf
d2FpdCgpLg0KPj4+Pj4gDQo+Pj4+PiBXYWl0LCBzbyB5b3UgaGF2ZSAxIHRocmVhZCBkb2luZyBh
biBPX0RJUkVDVCB3cml0ZSgpIGFuZCBhbm90aGVyIGRvaW5nIGENCj4+Pj4+IHN0YXQoKSBpbiBw
YXJhbGxlbD8gV2h5IHdvdWxkIHRoZXJlIGJlIGFuIGV4cGVjdGF0aW9uIHRoYXQgdGhlIGZpbGVz
eXN0ZW0NCj4+Pj4+IHNob3VsZCBzZXJpYWxpc2UgdGhvc2Ugc3lzdGVtIGNhbGxzPw0KPj4+PiAN
Cj4+Pj4gTm90IGV4YWN0bHkgcGFyYWxsZWwsIGJ1dCBzeW5jaHJvbml6ZWQgb24gYWlvX2NvbXBs
ZXRlLiAgQSBjb21tZW50IGluDQo+Pj4+IGdlbmVyaWMvMjA3J3Mgc3JjL2Fpby1kaW8tcmVncmVz
cy9haW8tZGlvLWV4dGVuZC1zdGF0LmM6DQo+Pj4+IA0KPj4+PiAzNiAvKg0KPj4+PiAzNyAgKiBU
aGlzIHdhcyBvcmlnaW5hbGx5IHN1Ym1pdHRlZCB0bw0KPj4+PiAzOCAgKiBodHRwOi8vYnVnemls
bGEua2VybmVsLm9yZy9zaG93X2J1Zy5jZ2k/aWQ9NjgzMSBieQ0KPj4+PiAzOSAgKiBSYWZhbCBX
aWphdGEgPHdpamF0YUBuZWMtbGFicy5jb20+LiAgSXQgY2F1Z2h0IGEgcmFjZSBpbiBkaW8gYWlv
IGNvbXBsZXRpb24NCj4+Pj4gNDAgICogdGhhdCB3b3VsZCBjYWxsIGFpb19jb21wbGV0ZSgpIGJl
Zm9yZSB0aGUgZGlvIGNhbGxlcnMgd291bGQgdXBkYXRlIGlfc2l6ZS4NCj4+Pj4gNDEgICogQSBz
dGF0IGFmdGVyIGlvX2dldGV2ZW50cygpIHdvdWxkIG5vdCBzZWUgdGhlIG5ldyBmaWxlIHNpemUu
DQo+Pj4+IDQyICAqDQo+Pj4+IDQzICAqIFRoZSBidWcgd2FzIGZpeGVkIGluIHRoZSBmcy9kaXJl
Y3QtaW8uYyBjb21wbGV0aW9uIHJld29ya2luZyB0aGF0IGFwcGVhcmVkDQo+Pj4+IDQ0ICAqIGlu
IDIuNi4yMC4gIFRoaXMgdGVzdCBzaG91bGQgZmFpbCBvbiAyLjYuMTkuDQo+Pj4+IDQ1ICAqLw0K
Pj4+PiANCj4+Pj4gQXMgZmFyIGFzIEkgY2FuIHNlZSwgdGhpcyBjaGVjayBpcyB0aGUgd2hvbGUg
cG9pbnQgb2YgZ2VuZXJpYy8yMDcuLg0KPj4+IA0KPj4+IFRoaXMgd291bGQgZml4IGl0IHVwOg0K
Pj4+IA0KPj4+IGRpZmYgLS1naXQgYS9mcy9uZnMvaW5vZGUuYyBiL2ZzL25mcy9pbm9kZS5jDQo+
Pj4gaW5kZXggZjEwOGQ1ODEwMWY4Li44MjM3MDBmODI3YjYgMTAwNjQ0DQo+Pj4gLS0tIGEvZnMv
bmZzL2lub2RlLmMNCj4+PiArKysgYi9mcy9uZnMvaW5vZGUuYw0KPj4+IEBAIC02NjEsNiArNjYx
LDcgQEAgaW50IG5mc19nZXRhdHRyKHN0cnVjdCB2ZnNtb3VudCAqbW50LCBzdHJ1Y3QgZGVudHJ5
DQo+Pj4gKmRlbnRyeSwgc3RydWN0IGtzdGF0ICpzdGF0KQ0KPj4+IA0KPj4+ICAgICAgIHRyYWNl
X25mc19nZXRhdHRyX2VudGVyKGlub2RlKTsNCj4+PiAgICAgICAvKiBGbHVzaCBvdXQgd3JpdGVz
IHRvIHRoZSBzZXJ2ZXIgaW4gb3JkZXIgdG8gdXBkYXRlIGMvbXRpbWUuICAqLw0KPj4+ICsgICAg
ICAgbmZzX3N0YXJ0X2lvX3JlYWQoaW5vZGUpOw0KPj4+ICAgICAgIGlmIChTX0lTUkVHKGlub2Rl
LT5pX21vZGUpKSB7DQo+Pj4gICAgICAgICAgICAgICBlcnIgPSBmaWxlbWFwX3dyaXRlX2FuZF93
YWl0KGlub2RlLT5pX21hcHBpbmcpOw0KPj4+ICAgICAgICAgICAgICAgaWYgKGVycikNCj4+PiBA
QCAtNjk0LDYgKzY5NSw3IEBAIGludCBuZnNfZ2V0YXR0cihzdHJ1Y3QgdmZzbW91bnQgKm1udCwg
c3RydWN0IGRlbnRyeQ0KPj4+ICpkZW50cnksIHN0cnVjdCBrc3RhdCAqc3RhdCkNCj4+PiAgICAg
ICAgICAgICAgICAgICAgICAgc3RhdC0+Ymxrc2l6ZSA9IE5GU19TRVJWRVIoaW5vZGUpLT5kdHNp
emU7DQo+Pj4gICAgICAgfQ0KPj4+IG91dDoNCj4+PiArICAgICAgIG5mc19lbmRfaW9fcmVhZChp
bm9kZSk7DQo+Pj4gICAgICAgdHJhY2VfbmZzX2dldGF0dHJfZXhpdChpbm9kZSwgZXJyKTsNCj4+
PiAgICAgICByZXR1cm4gZXJyOw0KPj4+IH0NCj4+PiANCj4+PiBUcm9uZCwgd2hhdCBkbyB5b3Ug
dGhpbms/ICBJJ2xsIHRha2UgYW55IGFkZGl0aW9uYWwgc2lsZW5jZSBhcyBhIHNpZ24gdG8gZ28N
Cj4+PiBlbHNld2hlcmUuICA6UA0KPj4gDQo+PiBOby4gVGhlIGFib3ZlIGxvY2tpbmcgZXhjbHVk
ZXMgYWxsIHdyaXRlcyBhcyB3ZWxsIGFzIE9fRElSRUNUIHJlYWRz4oCmIFRoYXTigJlzIHdvcnNl
IHRoYW4gd2UgaGFkIGJlZm9yZS4NCj4+IA0KPj4gSeKAmWQgbGlrZSByYXRoZXIgdG8gdW5kZXJz
dGFuZCBfd2h5XyB0aGUgYWlvX2NvbXBsZXRlKCkgaXMgZmFpbGluZyB0byB3b3JrIGNvcnJlY3Rs
eSBoZXJlLiBBY2NvcmRpbmcgdG8gdGhlIGFuYWx5c2lzIG9mIHRoZSB0ZXN0IGNhc2UgdGhhdCB5
b3UgcXVvdGVkIHllc3RlcmRheSwgdGhlIE9fRElSRUNUIHdyaXRlcyBzaG91bGQgaGF2ZSBjb21w
bGV0ZWQgYmVmb3JlIHdlIGV2ZW4gY2FsbCBzdGF0KCkuDQo+PiANCj4+IENoZWVycw0KPj4gIFRy
b25kDQo+IA0KPiBUaGUgT19ESVJFQ1Qgd3JpdGVzIGRvIGNvbXBsZXRlLCBhbmQgZXZlcnkgY29t
cGxldGlvbiBzaWduYWxzIHRoZSBvdGhlcg0KPiB0aHJlYWQgdG8gZG8gc3RhdCgpLCBidXQgdGhh
dCBjb21wbGV0aW9uIGRvZXMgbm90IHVwZGF0ZSB0aGUgc2l6ZSBvbiB0aGUNCj4gc2VydmVyLiAg
QXMgd2Uga25vdywgd2UgbmVlZCBhIExBWU9VVENPTU1JVC4gIEFmdGVyIHRoaXMgcGF0Y2gsIHdl
J3JlIG9ubHkNCj4gZ29pbmcgdG8gZG8gYSBMQVlPVVRDT01NSVQgaWYgbmZzX25lZWRfcmV2YWxp
ZGF0ZV9pbm9kZShpbm9kZSkuDQo+IA0KDQpTbyBob3cgaXMgdGhlIGNvbXBsZXRpb24gaGFwcGVu
aW5nPyBBcyBmYXIgYXMgSSBrbm93LCB0aGlzIGlzIHdoYXQgaXMgc3VwcG9zZWQgdG8gaGFwcGVu
Og0KDQotIGJsX3dyaXRlX2NsZWFudXAoKSBjYWxscyBwbmZzX2xkX3dyaXRlX2RvbmUoKSwNCiAg
ICAtIHBuZnNfbGRfd3JpdGVfZG9uZSB0aGVuIGZpcnN0IGNhbGxzIHBuZnNfc2V0X2xheW91dGNv
bW1pdCgpIGFuZCBuZnNfcGdpb19yZXN1bHQoKSAod2hpY2ggY2FsbHMgbmZzX3dyaXRlYmFja19k
b25lKCkgYW5kIGV2ZW50dWFsbHkgbmZzX3dyaXRlYmFja191cGRhdGVfaW5vZGUoKSkNCiAgICAt
IHBuZnNfbGRfd3JpdGVfZG9uZSB0aGVuIGNhbGxzIG5mc19wZ2lvX3JlbGVhc2UoKSwgd2hpY2gg
YWdhaW4gY2FsbHMgbmZzX2RpcmVjdF93cml0ZV9jb21wbGV0aW9uKCkuDQoNCklzIHNvbWV0aGlu
ZyBzZXR0aW5nIGhkci0+cG5mc19lcnJvciBhbmQgcHJldmVudGluZyB0aGUgY2FsbCB0byBwbmZz
X3NldF9sYXlvdXRjb21taXQgYW5kIG5mc193cml0ZWJhY2tfZG9uZSgpPyBJZiBub3QsIHRoZW4g
d2h5IGlzIHRoZSBjYWxsIHRvIG5mc193cml0ZWJhY2tfdXBkYXRlX2lub2RlKCkgbm90IHNldHRp
bmcgdGhlIGZpbGUgc2l6ZT8NCg0KPiBTbyB3aGF0IGhhcHBlbnMgaXMgdGhhdCB0aGUgZmlyc3Qg
d3JpdGUgY29tcGxldGVzLCBhbmQgdGhlIGZpcnN0DQo+IG5mc19nZXRhdHRyKCkgdHJpZ2dlcnMg
dGhlIGZpcnN0IExBWU9VVENPTU1JVCwgYW5kIHRoZW4gYSBHRVRBVFRSLg0KPiBTaW11bHRhbmVv
dXNseSwgdGhlIHNlY29uZCB3cml0ZSBpcyB3YWl0aW5nIG9uIGEgTEFZT1VUR0VULiAgVGhlIEdF
VEFUVFINCj4gY29tcGxldGVzIGFuZCBzZXRzIHRoZSBzaXplIHRvIDRrLg0KPiANCj4gTm93IHRo
ZSBhdHRyaWJ1dGVzIGFyZSBtYXJrZWQgdXAgdG8gZGF0ZSB3aXRoIGEgc2l6ZSBvZiA0aywgYW5k
IHdoZW4gdGhlDQo+IHNlY29uZCB3cml0ZSBjb21wbGV0ZXMsIG5mc19nZXRhdHRyKCkgaXMgY2Fs
bGVkLCBuZnNfbmVlZF9yZXZhbGlkYXRlX2lub2RlKCkNCj4gaXMgZmFsc2UsIGFuZCB3ZSBkb24n
dCBib3RoZXIgdG8gc2VuZCBhbm90aGVyIExBWU9VVENPTU1JVCBvciBHRVRBVFRSIHRvDQo+IGNv
cnJlY3QgdGhlIGNhY2hlZCBmaWxlIHNpemUuDQo+IA0KPiBIZXJlJ3MgYSBmdW5jdGlvbiBncmFw
aCBvZiB0aGF0OiBodHRwOi8vcGVvcGxlLnJlZGhhdC5jb20vYmNvZGRpbmcvYm9ya2VuDQo+IA0K
PiBXZSBjb3VsZCBpbnZhbGlkYXRlIHRoZSBhdHRyaWJ1dGUgY2FjaGUgZXZlcnkgdGltZSBhIHdy
aXRlIGNvbXBsZXRlcy4uIG1heWJlDQo+IG5mc193cml0ZWJhY2tfdXBkYXRlX2lub2RlKCkgaXNu
J3QgZG9pbmcgdGhlIGpvYiBmb3IgYmxvY2sgbGF5b3V0cyAoYXJlIHdlDQo+IG5vdCBzZXR0aW5n
IHJlcy5jb3VudD8gIEknbGwgbG9vayBhdCB0aGF0Li4pDQo+IA0KPiBJIHRoaW5rIHdlIGNvdWxk
IGFsc28gdXNlIHRoZSBMQVlPVVRDT01NSVQgcmVzdWx0cyB0byBpbnZhbGlkYXRlIHRoZSBjYWNo
ZSwNCj4gUkZDLTU2NjEgMTguNDIuMzoNCj4gICBJZiB0aGUgbWV0YWRhdGEgc2VydmVyIHVwZGF0
ZXMgdGhlIGZpbGUncyBzaXplIGFzIHRoZQ0KPiAgIHJlc3VsdCBvZiB0aGUgTEFZT1VUQ09NTUlU
IG9wZXJhdGlvbiwgaXQgbXVzdCByZXR1cm4gdGhlIG5ldyBzaXplDQo+ICAgKGxvY3JfbmV3c2l6
ZS5uc19zaXplKSBhcyBwYXJ0IG9mIHRoZSByZXN1bHRzLiINCj4gDQo+IEknbSBub3Qgc3VyZSB5
b3Ugd2FudCB0byBib3RoZXIgdG8gdHJ5IHRvIHJlcHJvZHVjZSAtLSBidXQgaWYgc28sIHlvdSBk
b24ndA0KPiBuZWVkIHNwZWNpYWwgaGFyZHdhcmUgZm9yIFNDU0kgbGF5b3V0LiAgSSB3cm90ZSB1
cCBhIHF1aWNrIGhvdy10byBmb3IgU0NTSQ0KPiBsYXlvdXRzIG9uIGEgVk0gaW4gcWVtdToNCj4g
aHR0cDovL3Blb3BsZS5yZWRoYXQuY29tL2Jjb2RkaW5nL3BuZnMvbmZzL3Njc2kvMjAxNi8wNy8x
My9wbmZzX3Njc2lfc2V0dXBfZm9yX1ZNcy8NCj4gDQo+IEJlbg0KDQo=

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-26 18:07                                                                                             ` Trond Myklebust
@ 2016-07-27 11:55                                                                                               ` Benjamin Coddington
  2016-07-27 12:15                                                                                                 ` Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Benjamin Coddington @ 2016-07-27 11:55 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: hch, List Linux NFS Mailing

On 26 Jul 2016, at 14:07, Trond Myklebust wrote:

>> On Jul 26, 2016, at 13:57, Benjamin Coddington <bcodding@redhat.com> 
>> wrote:
>>
>> On 26 Jul 2016, at 12:35, Trond Myklebust wrote:
>>
>>>> On Jul 26, 2016, at 12:32, Benjamin Coddington 
>>>> <bcodding@redhat.com> wrote:
>>>>
>>>> On 25 Jul 2016, at 14:41, Benjamin Coddington wrote:
>>>>
>>>>> On 25 Jul 2016, at 14:34, Trond Myklebust wrote:
>>>>>
>>>>>>> On Jul 25, 2016, at 14:26, Benjamin Coddington 
>>>>>>> <bcodding@redhat.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 25 Jul 2016, at 12:39, Trond Myklebust wrote:
>>>>>>>
>>>>>>>>> On Jul 25, 2016, at 12:26, Benjamin Coddington 
>>>>>>>>> <bcodding@redhat.com> wrote:
>>>>>>>>>
>>>>>>>>> On 21 Jul 2016, at 9:20, Trond Myklebust wrote:
>>>>>>>>>
>>>>>>>>>>> On Jul 21, 2016, at 09:05, Benjamin Coddington 
>>>>>>>>>>> <bcodding@redhat.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> So back to Christoph's point earlier:
>>>>>>>>>>>
>>>>>>>>>>> On 17 Jul 2016, at 23:48, Christoph Hellwig wrote:
>>>>>>>>>>>> This one breaks xfstests generic/207 on block/scsi layout 
>>>>>>>>>>>> for me.  The
>>>>>>>>>>>> reason for that is that we need a layoutcommit after 
>>>>>>>>>>>> writing out all
>>>>>>>>>>>> data for the file for the file size to be updated on the 
>>>>>>>>>>>> server.
>>>>>>>>>>>
>>>>>>>>>>> You responded:
>>>>>>>>>>>
>>>>>>>>>>> On 18 Jul 2016, at 0:32, Trond Myklebust wrote:
>>>>>>>>>>>> I’m not understanding this argument. Why do we care if 
>>>>>>>>>>>> the file size is up
>>>>>>>>>>>> to date on the server if we’re not sending an actual 
>>>>>>>>>>>> GETATTR on the wire
>>>>>>>>>>>> to retrieve the file size?
>>>>>>>>>>>
>>>>>>>>>>> I guess the answer might be because we can get it back from 
>>>>>>>>>>> the last
>>>>>>>>>>> LAYOUTCOMMIT.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The patch that I followed up with should now ensure that we 
>>>>>>>>>> do not mark the attribute cache as up to date if there is a 
>>>>>>>>>> LAYOUTCOMMIT pending.
>>>>>>>>>> IOW: when the pNFS write is done, it is expected to do 2 
>>>>>>>>>> things:
>>>>>>>>>>
>>>>>>>>>> 1) mark the inode for LAYOUTCOMMIT
>>>>>>>>>> 2) mark the attribute cache as invalid (because we know the 
>>>>>>>>>> change attribute, mtime, ctime need to be updates)
>>>>>>>>>>
>>>>>>>>>> In the case of blocks pNFS write:
>>>>>>>>>> The call to pnfs_set_layoutcommit() in pnfs_ld_write_done() 
>>>>>>>>>> should take care of (1)
>>>>>>>>>> The call to nfs_writeback_update_inode() in 
>>>>>>>>>> nfs4_write_done_cb() should take care of (2).
>>>>>>>>>>
>>>>>>>>>> Provided that these 2 calls are performed in the above order, 
>>>>>>>>>> then any call to nfs_getattr() which has not been preceded by 
>>>>>>>>>> a call to nfs4_proc_layoutcommit() should trigger the call to 
>>>>>>>>>> __nfs_revalidate_inode().
>>>>>>>>>
>>>>>>>>> I think the problem is that a following nfs_getattr() will 
>>>>>>>>> fail to notice
>>>>>>>>> the size change in the case of a write_completion and 
>>>>>>>>> layoutcommit occuring
>>>>>>>>> after nfs_getattr() has done pnfs_sync_inode() but before it 
>>>>>>>>> has done
>>>>>>>>> nfs_update_inode().
>>>>>>>>>
>>>>>>>>> In the failing case there are two threads one is doing writes, 
>>>>>>>>> the other
>>>>>>>>> doing lstat on aio_complete via io_getevents(2).
>>>>>>>>>
>>>>>>>>> For each write completion the lstat thread tries to verify the 
>>>>>>>>> file size.
>>>>>>>>>
>>>>>>>>> GETATTR Thread                  LAYOUTCOMMIT Thread
>>>>>>>>> --------------                  --------------------
>>>>>>>>>                             write_completion sets LAYOUTCOMMIT 
>>>>>>>>> (4096@0)
>>>>>>>>> --> nfs_getattr
>>>>>>>>
>>>>>>>> filemap_write_and_wait()
>>>>>>>>
>>>>>>>>> __nfs_revalidate_inode
>>>>>>>>> pnfs_sync_inode
>>>>>>>>
>>>>>>>> NFS_PROTO(inode)->getattr()
>>>>>>>>
>>>>>>>>> getattr sees 4096
>>>>>>>>>                             write_completion sets LAYOUTCOMMIT 
>>>>>>>>> (4096@4096)
>>>>>>>>>                             sets LAYOUTCOMMITING
>>>>>>>>>                             clears LAYOUTCOMMIT
>>>>>>>>>                             clears LAYOUTCOMMITTING
>>>>>>>>> nfs_refresh_inode
>>>>>>>>> nfs_update_inode size is 4096
>>>>>>>>> <-- nfs_getattr
>>>>>>>>>
>>>>>>>>> At this point the cached attributes are seen as up to date, 
>>>>>>>>> but
>>>>>>>>> aio-dio-extend-stat program expects that second 
>>>>>>>>> write_completion to reflect
>>>>>>>>> in the file size.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Why isn’t the filemap_write_and_wait() above resolving the 
>>>>>>>> race? I’d
>>>>>>>> expect that would move your “write completion sets 
>>>>>>>> LAYOUTCOMMIT” up to
>>>>>>>> before the pnfs_sync_inode().  In fact, in the patch that 
>>>>>>>> Christoph sent,
>>>>>>>> all he was doing was moving the pnfs_sync_inode() to 
>>>>>>>> immediately after
>>>>>>>> that filemap_write_and_wait() instead of relying on it in
>>>>>>>> _nfs_revalidate_inode.
>>>>>>>
>>>>>>> This is O_DIRECT, I've failed to mention yet.  The second write 
>>>>>>> hasn't made
>>>>>>> it out of __nfs_pageio_add_request() at the time 
>>>>>>> filemap_write_and_wait() is
>>>>>>> called.  It is sleeping in pnfs_update_layout() waiting on a 
>>>>>>> LAYOUTGET and it
>>>>>>> doesn't resumes until after filemap_write_and_wait().
>>>>>>
>>>>>> Wait, so you have 1 thread doing an O_DIRECT write() and another 
>>>>>> doing a
>>>>>> stat() in parallel? Why would there be an expectation that the 
>>>>>> filesystem
>>>>>> should serialise those system calls?
>>>>>
>>>>> Not exactly parallel, but synchronized on aio_complete.  A comment 
>>>>> in
>>>>> generic/207's src/aio-dio-regress/aio-dio-extend-stat.c:
>>>>>
>>>>> 36 /*
>>>>> 37  * This was originally submitted to
>>>>> 38  * http://bugzilla.kernel.org/show_bug.cgi?id=6831 by
>>>>> 39  * Rafal Wijata <wijata@nec-labs.com>.  It caught a race in dio 
>>>>> aio completion
>>>>> 40  * that would call aio_complete() before the dio callers would 
>>>>> update i_size.
>>>>> 41  * A stat after io_getevents() would not see the new file size.
>>>>> 42  *
>>>>> 43  * The bug was fixed in the fs/direct-io.c completion reworking 
>>>>> that appeared
>>>>> 44  * in 2.6.20.  This test should fail on 2.6.19.
>>>>> 45  */
>>>>>
>>>>> As far as I can see, this check is the whole point of 
>>>>> generic/207..
>>>>
>>>> This would fix it up:
>>>>
>>>> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
>>>> index f108d58101f8..823700f827b6 100644
>>>> --- a/fs/nfs/inode.c
>>>> +++ b/fs/nfs/inode.c
>>>> @@ -661,6 +661,7 @@ int nfs_getattr(struct vfsmount *mnt, struct 
>>>> dentry
>>>> *dentry, struct kstat *stat)
>>>>
>>>>       trace_nfs_getattr_enter(inode);
>>>>       /* Flush out writes to the server in order to update c/mtime. 
>>>>  */
>>>> +       nfs_start_io_read(inode);
>>>>       if (S_ISREG(inode->i_mode)) {
>>>>               err = filemap_write_and_wait(inode->i_mapping);
>>>>               if (err)
>>>> @@ -694,6 +695,7 @@ int nfs_getattr(struct vfsmount *mnt, struct 
>>>> dentry
>>>> *dentry, struct kstat *stat)
>>>>                       stat->blksize = NFS_SERVER(inode)->dtsize;
>>>>       }
>>>> out:
>>>> +       nfs_end_io_read(inode);
>>>>       trace_nfs_getattr_exit(inode, err);
>>>>       return err;
>>>> }
>>>>
>>>> Trond, what do you think?  I'll take any additional silence as a 
>>>> sign to go
>>>> elsewhere.  :P
>>>
>>> No. The above locking excludes all writes as well as O_DIRECT 
>>> reads… That’s worse than we had before.
>>>
>>> I’d like rather to understand _why_ the aio_complete() is failing 
>>> to work correctly here. According to the analysis of the test case 
>>> that you quoted yesterday, the O_DIRECT writes should have completed 
>>> before we even call stat().
>>>
>>> Cheers
>>>  Trond
>>
>> The O_DIRECT writes do complete, and every completion signals the 
>> other
>> thread to do stat(), but that completion does not update the size on 
>> the
>> server.  As we know, we need a LAYOUTCOMMIT.  After this patch, we're 
>> only
>> going to do a LAYOUTCOMMIT if nfs_need_revalidate_inode(inode).
>>
>
> So how is the completion happening? As far as I know, this is what is 
> supposed to happen:
>
> - bl_write_cleanup() calls pnfs_ld_write_done(),
>     - pnfs_ld_write_done then first calls pnfs_set_layoutcommit() and 
> nfs_pgio_result() (which calls nfs_writeback_done() and eventually 
> nfs_writeback_update_inode())
>     - pnfs_ld_write_done then calls nfs_pgio_release(), which again 
> calls nfs_direct_write_completion().
>
> Is something setting hdr->pnfs_error and preventing the call to 
> pnfs_set_layoutcommit and nfs_writeback_done()? If not, then why is 
> the call to nfs_writeback_update_inode() not setting the file size?

After adding more debugging, I see that all of that is working 
correctly,
but the first LAYOUTCOMMIT is taking the size back down to 4096 from the
last nfs_writeback_done(), and the second LAYOUTCOMMIT never brings it 
back
up again.

Now I see that we should be marking the block extents as written 
atomically with
setting LAYOUTCOMMIT and nfsi->layout->plh_lwb, otherwise a LAYOUTCOMMIT 
can
collect extents just added from the next bl_write_cleanup().  Then, the 
next
LAYOUTCOMMIT fails, and all we're left with is the size from the first
LAYOUTCOMMIT.  Not sure if that particular problem is the whole fix, but
that's something to work on.

I see ways to fix that:

     - make a new pnfs_set_layoutcommit_locked() that can be used to 
call
       ext_tree_mark_written() inside the i_lock

     - make another pnfs_layoutdriver_type operation to be used within
       pnfs_set_layoutcommit (mark_layoutcommit? set_layoutcommit?), and 
call
       ext_tree_mark_written() within that..

     - have .prepare_layoutcommit return a new positive plh_lwb that 
would
       extend the current LAYOUTCOMMIT

     - make ext_tree_prepare_commit only encode up to plh_lwb

Ben

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-27 11:55                                                                                               ` Benjamin Coddington
@ 2016-07-27 12:15                                                                                                 ` Trond Myklebust
  2016-07-27 12:31                                                                                                   ` Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-27 12:15 UTC (permalink / raw)
  To: Coddington Benjamin; +Cc: hch, List Linux NFS Mailing

DQo+IE9uIEp1bCAyNywgMjAxNiwgYXQgMDc6NTUsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2Rk
aW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPiANCj4gQWZ0ZXIgYWRkaW5nIG1vcmUgZGVidWdnaW5n
LCBJIHNlZSB0aGF0IGFsbCBvZiB0aGF0IGlzIHdvcmtpbmcgY29ycmVjdGx5LA0KPiBidXQgdGhl
IGZpcnN0IExBWU9VVENPTU1JVCBpcyB0YWtpbmcgdGhlIHNpemUgYmFjayBkb3duIHRvIDQwOTYg
ZnJvbSB0aGUNCj4gbGFzdCBuZnNfd3JpdGViYWNrX2RvbmUoKSwgYW5kIHRoZSBzZWNvbmQgTEFZ
T1VUQ09NTUlUIG5ldmVyIGJyaW5ncyBpdCBiYWNrDQo+IHVwIGFnYWluLg0KPiANCg0KRXhjZWxs
ZW50ISBUaGFua3MgZm9yIGRlYnVnZ2luZyB0aGF0Lg0KDQo+IE5vdyBJIHNlZSB0aGF0IHdlIHNo
b3VsZCBiZSBtYXJraW5nIHRoZSBibG9jayBleHRlbnRzIGFzIHdyaXR0ZW4gYXRvbWljYWxseSB3
aXRoDQo+IHNldHRpbmcgTEFZT1VUQ09NTUlUIGFuZCBuZnNpLT5sYXlvdXQtPnBsaF9sd2IsIG90
aGVyd2lzZSBhIExBWU9VVENPTU1JVCBjYW4NCj4gY29sbGVjdCBleHRlbnRzIGp1c3QgYWRkZWQg
ZnJvbSB0aGUgbmV4dCBibF93cml0ZV9jbGVhbnVwKCkuICBUaGVuLCB0aGUgbmV4dA0KPiBMQVlP
VVRDT01NSVQgZmFpbHMsIGFuZCBhbGwgd2UncmUgbGVmdCB3aXRoIGlzIHRoZSBzaXplIGZyb20g
dGhlIGZpcnN0DQo+IExBWU9VVENPTU1JVC4gIE5vdCBzdXJlIGlmIHRoYXQgcGFydGljdWxhciBw
cm9ibGVtIGlzIHRoZSB3aG9sZSBmaXgsIGJ1dA0KPiB0aGF0J3Mgc29tZXRoaW5nIHRvIHdvcmsg
b24uDQo+IA0KPiBJIHNlZSB3YXlzIHRvIGZpeCB0aGF0Og0KPiANCj4gICAgLSBtYWtlIGEgbmV3
IHBuZnNfc2V0X2xheW91dGNvbW1pdF9sb2NrZWQoKSB0aGF0IGNhbiBiZSB1c2VkIHRvIGNhbGwN
Cj4gICAgICBleHRfdHJlZV9tYXJrX3dyaXR0ZW4oKSBpbnNpZGUgdGhlIGlfbG9jaw0KPiANCj4g
ICAgLSBtYWtlIGFub3RoZXIgcG5mc19sYXlvdXRkcml2ZXJfdHlwZSBvcGVyYXRpb24gdG8gYmUg
dXNlZCB3aXRoaW4NCj4gICAgICBwbmZzX3NldF9sYXlvdXRjb21taXQgKG1hcmtfbGF5b3V0Y29t
bWl0PyBzZXRfbGF5b3V0Y29tbWl0PyksIGFuZCBjYWxsDQo+ICAgICAgZXh0X3RyZWVfbWFya193
cml0dGVuKCkgd2l0aGluIHRoYXQuLg0KPiANCj4gICAgLSBoYXZlIC5wcmVwYXJlX2xheW91dGNv
bW1pdCByZXR1cm4gYSBuZXcgcG9zaXRpdmUgcGxoX2x3YiB0aGF0IHdvdWxkDQo+ICAgICAgZXh0
ZW5kIHRoZSBjdXJyZW50IExBWU9VVENPTU1JVA0KPiANCj4gICAgLSBtYWtlIGV4dF90cmVlX3By
ZXBhcmVfY29tbWl0IG9ubHkgZW5jb2RlIHVwIHRvIHBsaF9sd2INCg0KSSBzZWUgbm8gcmVhc29u
IHdoeSBleHRfdHJlZV9wcmVwYXJlX2NvbW1pdCgpIHNob3VsZG7igJl0IGJlIGFsbG93ZWQgdG8g
ZXh0ZW5kIHRoZSBhcmdzLT5sYXN0Ynl0ZXdyaXR0ZW4uIFRoaXMgaXMgYSBtZXRhZGF0YSBvcGVy
YXRpb24gdGhhdCBpcyBvd25lZCBieSB0aGUgcE5GUyBsYXlvdXQgZHJpdmVyLg0KVGhlIG9ubHkg
dGhpbmcgSeKAmWQgbm90ZSBpcyB5b3Ugc2hvdWxkIHRoZW4gcmV3cml0ZSB0aGUgZmFpbHVyZSBj
YXNlIGluIHBuZnNfbGF5b3V0Y29tbWl0X2lub2RlKCkgc28gdGhhdCBpdCBkb2VzbuKAmXQgcmVs
eSBvbiB0aGUgc2F2ZWQg4oCcZW5kX3Bvc+KAnSwgYnV0IHVzZXMgYXJncy0+bGFzdGJ5dGV3cml0
dGVuIGluc3RlYWQgKHdpdGggYSBjb21tZW50IHRvIHRoZSBlZmZlY3Qgd2h5KeKApg0KDQpDaGVl
cnMNCiAgVHJvbmQ=

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-27 12:15                                                                                                 ` Trond Myklebust
@ 2016-07-27 12:31                                                                                                   ` Trond Myklebust
  2016-07-27 16:14                                                                                                     ` Benjamin Coddington
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-27 12:31 UTC (permalink / raw)
  To: Coddington Benjamin; +Cc: hch, List Linux NFS Mailing

DQo+IE9uIEp1bCAyNywgMjAxNiwgYXQgMDg6MTUsIFRyb25kIE15a2xlYnVzdCA8dHJvbmRteUBw
cmltYXJ5ZGF0YS5jb20+IHdyb3RlOg0KPiANCj4gDQo+PiBPbiBKdWwgMjcsIDIwMTYsIGF0IDA3
OjU1LCBCZW5qYW1pbiBDb2RkaW5ndG9uIDxiY29kZGluZ0ByZWRoYXQuY29tPiB3cm90ZToNCj4+
IA0KPj4gQWZ0ZXIgYWRkaW5nIG1vcmUgZGVidWdnaW5nLCBJIHNlZSB0aGF0IGFsbCBvZiB0aGF0
IGlzIHdvcmtpbmcgY29ycmVjdGx5LA0KPj4gYnV0IHRoZSBmaXJzdCBMQVlPVVRDT01NSVQgaXMg
dGFraW5nIHRoZSBzaXplIGJhY2sgZG93biB0byA0MDk2IGZyb20gdGhlDQo+PiBsYXN0IG5mc193
cml0ZWJhY2tfZG9uZSgpLCBhbmQgdGhlIHNlY29uZCBMQVlPVVRDT01NSVQgbmV2ZXIgYnJpbmdz
IGl0IGJhY2sNCj4+IHVwIGFnYWluLg0KPj4gDQo+IA0KPiBFeGNlbGxlbnQhIFRoYW5rcyBmb3Ig
ZGVidWdnaW5nIHRoYXQuDQo+IA0KPj4gTm93IEkgc2VlIHRoYXQgd2Ugc2hvdWxkIGJlIG1hcmtp
bmcgdGhlIGJsb2NrIGV4dGVudHMgYXMgd3JpdHRlbiBhdG9taWNhbGx5IHdpdGgNCj4+IHNldHRp
bmcgTEFZT1VUQ09NTUlUIGFuZCBuZnNpLT5sYXlvdXQtPnBsaF9sd2IsIG90aGVyd2lzZSBhIExB
WU9VVENPTU1JVCBjYW4NCj4+IGNvbGxlY3QgZXh0ZW50cyBqdXN0IGFkZGVkIGZyb20gdGhlIG5l
eHQgYmxfd3JpdGVfY2xlYW51cCgpLiAgVGhlbiwgdGhlIG5leHQNCj4+IExBWU9VVENPTU1JVCBm
YWlscywgYW5kIGFsbCB3ZSdyZSBsZWZ0IHdpdGggaXMgdGhlIHNpemUgZnJvbSB0aGUgZmlyc3QN
Cj4+IExBWU9VVENPTU1JVC4gIE5vdCBzdXJlIGlmIHRoYXQgcGFydGljdWxhciBwcm9ibGVtIGlz
IHRoZSB3aG9sZSBmaXgsIGJ1dA0KPj4gdGhhdCdzIHNvbWV0aGluZyB0byB3b3JrIG9uLg0KPj4g
DQo+PiBJIHNlZSB3YXlzIHRvIGZpeCB0aGF0Og0KPj4gDQo+PiAgIC0gbWFrZSBhIG5ldyBwbmZz
X3NldF9sYXlvdXRjb21taXRfbG9ja2VkKCkgdGhhdCBjYW4gYmUgdXNlZCB0byBjYWxsDQo+PiAg
ICAgZXh0X3RyZWVfbWFya193cml0dGVuKCkgaW5zaWRlIHRoZSBpX2xvY2sNCj4+IA0KPj4gICAt
IG1ha2UgYW5vdGhlciBwbmZzX2xheW91dGRyaXZlcl90eXBlIG9wZXJhdGlvbiB0byBiZSB1c2Vk
IHdpdGhpbg0KPj4gICAgIHBuZnNfc2V0X2xheW91dGNvbW1pdCAobWFya19sYXlvdXRjb21taXQ/
IHNldF9sYXlvdXRjb21taXQ/KSwgYW5kIGNhbGwNCj4+ICAgICBleHRfdHJlZV9tYXJrX3dyaXR0
ZW4oKSB3aXRoaW4gdGhhdC4uDQo+PiANCj4+ICAgLSBoYXZlIC5wcmVwYXJlX2xheW91dGNvbW1p
dCByZXR1cm4gYSBuZXcgcG9zaXRpdmUgcGxoX2x3YiB0aGF0IHdvdWxkDQo+PiAgICAgZXh0ZW5k
IHRoZSBjdXJyZW50IExBWU9VVENPTU1JVA0KPj4gDQo+PiAgIC0gbWFrZSBleHRfdHJlZV9wcmVw
YXJlX2NvbW1pdCBvbmx5IGVuY29kZSB1cCB0byBwbGhfbHdiDQo+IA0KPiBJIHNlZSBubyByZWFz
b24gd2h5IGV4dF90cmVlX3ByZXBhcmVfY29tbWl0KCkgc2hvdWxkbuKAmXQgYmUgYWxsb3dlZCB0
byBleHRlbmQgdGhlIGFyZ3MtPmxhc3RieXRld3JpdHRlbi4gVGhpcyBpcyBhIG1ldGFkYXRhIG9w
ZXJhdGlvbiB0aGF0IGlzIG93bmVkIGJ5IHRoZSBwTkZTIGxheW91dCBkcml2ZXIuDQo+IFRoZSBv
bmx5IHRoaW5nIEnigJlkIG5vdGUgaXMgeW91IHNob3VsZCB0aGVuIHJld3JpdGUgdGhlIGZhaWx1
cmUgY2FzZSBpbiBwbmZzX2xheW91dGNvbW1pdF9pbm9kZSgpIHNvIHRoYXQgaXQgZG9lc27igJl0
IHJlbHkgb24gdGhlIHNhdmVkIOKAnGVuZF9wb3PigJ0sIGJ1dCB1c2VzIGFyZ3MtPmxhc3RieXRl
d3JpdHRlbiBpbnN0ZWFkICh3aXRoIGEgY29tbWVudCB0byB0aGUgZWZmZWN0IHdoeSnigKYNCg0K
SW4gZmFjdCwgZ2l2ZW4gdGhlIHBvdGVudGlhbCBmb3IgcmFjZXMgaGVyZSwgSSB0aGluayB0aGUg
cmlnaHQgdGhpbmcgdG8gZG8gaXMgdG8gaGF2ZSBleHRfdHJlZV9wcmVwYXJlX2NvbW1pdCgpIGFs
d2F5cyBzZXQgdGhlIGNvcnJlY3QgdmFsdWUgZm9yIGFyZ3MtPmxhc3RieXRld3JpdHRlbi4NCg0K
Q2hlZXJzDQogIFRyb25k

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-27 12:31                                                                                                   ` Trond Myklebust
@ 2016-07-27 16:14                                                                                                     ` Benjamin Coddington
  2016-07-27 18:05                                                                                                       ` Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Benjamin Coddington @ 2016-07-27 16:14 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: hch, List Linux NFS Mailing

On 27 Jul 2016, at 8:31, Trond Myklebust wrote:

>> On Jul 27, 2016, at 08:15, Trond Myklebust <trondmy@primarydata.com> 
>> wrote:
>>
>>
>>> On Jul 27, 2016, at 07:55, Benjamin Coddington <bcodding@redhat.com> 
>>> wrote:
>>>
>>> After adding more debugging, I see that all of that is working 
>>> correctly,
>>> but the first LAYOUTCOMMIT is taking the size back down to 4096 from 
>>> the
>>> last nfs_writeback_done(), and the second LAYOUTCOMMIT never brings 
>>> it back
>>> up again.
>>>
>>
>> Excellent! Thanks for debugging that.
>>
>>> Now I see that we should be marking the block extents as written 
>>> atomically with
>>> setting LAYOUTCOMMIT and nfsi->layout->plh_lwb, otherwise a 
>>> LAYOUTCOMMIT can
>>> collect extents just added from the next bl_write_cleanup().  Then, 
>>> the next
>>> LAYOUTCOMMIT fails, and all we're left with is the size from the 
>>> first
>>> LAYOUTCOMMIT.  Not sure if that particular problem is the whole fix, 
>>> but
>>> that's something to work on.
>>>
>>> I see ways to fix that:
>>>
>>>   - make a new pnfs_set_layoutcommit_locked() that can be used to 
>>> call
>>>     ext_tree_mark_written() inside the i_lock
>>>
>>>   - make another pnfs_layoutdriver_type operation to be used within
>>>     pnfs_set_layoutcommit (mark_layoutcommit? set_layoutcommit?), 
>>> and call
>>>     ext_tree_mark_written() within that..
>>>
>>>   - have .prepare_layoutcommit return a new positive plh_lwb that 
>>> would
>>>     extend the current LAYOUTCOMMIT
>>>
>>>   - make ext_tree_prepare_commit only encode up to plh_lwb
>>
>> I see no reason why ext_tree_prepare_commit() shouldn’t be allowed 
>> to extend the args->lastbytewritten. This is a metadata operation 
>> that is owned by the pNFS layout driver.
>> The only thing I’d note is you should then rewrite the failure case 
>> in pnfs_layoutcommit_inode() so that it doesn’t rely on the saved 
>> “end_pos”, but uses args->lastbytewritten instead (with a comment 
>> to the effect why)…
>
> In fact, given the potential for races here, I think the right thing 
> to do is to have ext_tree_prepare_commit() always set the correct 
> value for args->lastbytewritten.

OK, that has cleared up that common failure case that was getting in the
way, but now it can still fail like this:

nfs_writeback_update_inode sets size 4096 w/ NFS_INO_INVALID_ATTR set, 
and sets NFS_INO_LAYOUTCOMMIT
1st nfs_getattr -> pnfs_layoutcommit_inode starts, clears layoutcommit 
flag sets NFS_INO_LAYOUTCOMMITING
nfs_writeback_update_inode sets size 8192 w/ NFS_INO_INVALID_ATTR set, 
and sets NFS_INO_LAYOUTCOMMIT
1st nfs_getattr -> nfs4_layoutcommit_release sets size 4096, 
NFS_INO_INVALID_ATTR set, clears NFS_INO_LAYOUTCOMMITTING
1st nfs_getattr -> __revalidate_inode sets size 4096, 
NFS_INO_INVALID_ATTR not set.. cache is valid
2nd nfs_getattr immediately returns 4096 even though 
NFS_INO_LAYOUTCOMMIT

Ben

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-27 16:14                                                                                                     ` Benjamin Coddington
@ 2016-07-27 18:05                                                                                                       ` Trond Myklebust
  2016-07-28  9:47                                                                                                         ` Benjamin Coddington
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-27 18:05 UTC (permalink / raw)
  To: Coddington Benjamin; +Cc: hch, List Linux NFS Mailing

DQo+IE9uIEp1bCAyNywgMjAxNiwgYXQgMTI6MTQsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2Rk
aW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPiANCj4gT24gMjcgSnVsIDIwMTYsIGF0IDg6MzEsIFRy
b25kIE15a2xlYnVzdCB3cm90ZToNCj4gDQo+Pj4gT24gSnVsIDI3LCAyMDE2LCBhdCAwODoxNSwg
VHJvbmQgTXlrbGVidXN0IDx0cm9uZG15QHByaW1hcnlkYXRhLmNvbT4gd3JvdGU6DQo+Pj4gDQo+
Pj4gDQo+Pj4+IE9uIEp1bCAyNywgMjAxNiwgYXQgMDc6NTUsIEJlbmphbWluIENvZGRpbmd0b24g
PGJjb2RkaW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPj4+PiANCj4+Pj4gQWZ0ZXIgYWRkaW5nIG1v
cmUgZGVidWdnaW5nLCBJIHNlZSB0aGF0IGFsbCBvZiB0aGF0IGlzIHdvcmtpbmcgY29ycmVjdGx5
LA0KPj4+PiBidXQgdGhlIGZpcnN0IExBWU9VVENPTU1JVCBpcyB0YWtpbmcgdGhlIHNpemUgYmFj
ayBkb3duIHRvIDQwOTYgZnJvbSB0aGUNCj4+Pj4gbGFzdCBuZnNfd3JpdGViYWNrX2RvbmUoKSwg
YW5kIHRoZSBzZWNvbmQgTEFZT1VUQ09NTUlUIG5ldmVyIGJyaW5ncyBpdCBiYWNrDQo+Pj4+IHVw
IGFnYWluLg0KPj4+PiANCj4+PiANCj4+PiBFeGNlbGxlbnQhIFRoYW5rcyBmb3IgZGVidWdnaW5n
IHRoYXQuDQo+Pj4gDQo+Pj4+IE5vdyBJIHNlZSB0aGF0IHdlIHNob3VsZCBiZSBtYXJraW5nIHRo
ZSBibG9jayBleHRlbnRzIGFzIHdyaXR0ZW4gYXRvbWljYWxseSB3aXRoDQo+Pj4+IHNldHRpbmcg
TEFZT1VUQ09NTUlUIGFuZCBuZnNpLT5sYXlvdXQtPnBsaF9sd2IsIG90aGVyd2lzZSBhIExBWU9V
VENPTU1JVCBjYW4NCj4+Pj4gY29sbGVjdCBleHRlbnRzIGp1c3QgYWRkZWQgZnJvbSB0aGUgbmV4
dCBibF93cml0ZV9jbGVhbnVwKCkuICBUaGVuLCB0aGUgbmV4dA0KPj4+PiBMQVlPVVRDT01NSVQg
ZmFpbHMsIGFuZCBhbGwgd2UncmUgbGVmdCB3aXRoIGlzIHRoZSBzaXplIGZyb20gdGhlIGZpcnN0
DQo+Pj4+IExBWU9VVENPTU1JVC4gIE5vdCBzdXJlIGlmIHRoYXQgcGFydGljdWxhciBwcm9ibGVt
IGlzIHRoZSB3aG9sZSBmaXgsIGJ1dA0KPj4+PiB0aGF0J3Mgc29tZXRoaW5nIHRvIHdvcmsgb24u
DQo+Pj4+IA0KPj4+PiBJIHNlZSB3YXlzIHRvIGZpeCB0aGF0Og0KPj4+PiANCj4+Pj4gIC0gbWFr
ZSBhIG5ldyBwbmZzX3NldF9sYXlvdXRjb21taXRfbG9ja2VkKCkgdGhhdCBjYW4gYmUgdXNlZCB0
byBjYWxsDQo+Pj4+ICAgIGV4dF90cmVlX21hcmtfd3JpdHRlbigpIGluc2lkZSB0aGUgaV9sb2Nr
DQo+Pj4+IA0KPj4+PiAgLSBtYWtlIGFub3RoZXIgcG5mc19sYXlvdXRkcml2ZXJfdHlwZSBvcGVy
YXRpb24gdG8gYmUgdXNlZCB3aXRoaW4NCj4+Pj4gICAgcG5mc19zZXRfbGF5b3V0Y29tbWl0ICht
YXJrX2xheW91dGNvbW1pdD8gc2V0X2xheW91dGNvbW1pdD8pLCBhbmQgY2FsbA0KPj4+PiAgICBl
eHRfdHJlZV9tYXJrX3dyaXR0ZW4oKSB3aXRoaW4gdGhhdC4uDQo+Pj4+IA0KPj4+PiAgLSBoYXZl
IC5wcmVwYXJlX2xheW91dGNvbW1pdCByZXR1cm4gYSBuZXcgcG9zaXRpdmUgcGxoX2x3YiB0aGF0
IHdvdWxkDQo+Pj4+ICAgIGV4dGVuZCB0aGUgY3VycmVudCBMQVlPVVRDT01NSVQNCj4+Pj4gDQo+
Pj4+ICAtIG1ha2UgZXh0X3RyZWVfcHJlcGFyZV9jb21taXQgb25seSBlbmNvZGUgdXAgdG8gcGxo
X2x3Yg0KPj4+IA0KPj4+IEkgc2VlIG5vIHJlYXNvbiB3aHkgZXh0X3RyZWVfcHJlcGFyZV9jb21t
aXQoKSBzaG91bGRu4oCZdCBiZSBhbGxvd2VkIHRvIGV4dGVuZCB0aGUgYXJncy0+bGFzdGJ5dGV3
cml0dGVuLiBUaGlzIGlzIGEgbWV0YWRhdGEgb3BlcmF0aW9uIHRoYXQgaXMgb3duZWQgYnkgdGhl
IHBORlMgbGF5b3V0IGRyaXZlci4NCj4+PiBUaGUgb25seSB0aGluZyBJ4oCZZCBub3RlIGlzIHlv
dSBzaG91bGQgdGhlbiByZXdyaXRlIHRoZSBmYWlsdXJlIGNhc2UgaW4gcG5mc19sYXlvdXRjb21t
aXRfaW5vZGUoKSBzbyB0aGF0IGl0IGRvZXNu4oCZdCByZWx5IG9uIHRoZSBzYXZlZCDigJxlbmRf
cG9z4oCdLCBidXQgdXNlcyBhcmdzLT5sYXN0Ynl0ZXdyaXR0ZW4gaW5zdGVhZCAod2l0aCBhIGNv
bW1lbnQgdG8gdGhlIGVmZmVjdCB3aHkp4oCmDQo+PiANCj4+IEluIGZhY3QsIGdpdmVuIHRoZSBw
b3RlbnRpYWwgZm9yIHJhY2VzIGhlcmUsIEkgdGhpbmsgdGhlIHJpZ2h0IHRoaW5nIHRvIGRvIGlz
IHRvIGhhdmUgZXh0X3RyZWVfcHJlcGFyZV9jb21taXQoKSBhbHdheXMgc2V0IHRoZSBjb3JyZWN0
IHZhbHVlIGZvciBhcmdzLT5sYXN0Ynl0ZXdyaXR0ZW4uDQo+IA0KPiBPSywgdGhhdCBoYXMgY2xl
YXJlZCB1cCB0aGF0IGNvbW1vbiBmYWlsdXJlIGNhc2UgdGhhdCB3YXMgZ2V0dGluZyBpbiB0aGUN
Cj4gd2F5LCBidXQgbm93IGl0IGNhbiBzdGlsbCBmYWlsIGxpa2UgdGhpczoNCj4gDQoNCkdvb2Qg
cHJvZ3Jlc3MhIDotKQ0KDQo+IG5mc193cml0ZWJhY2tfdXBkYXRlX2lub2RlIHNldHMgc2l6ZSA0
MDk2IHcvIE5GU19JTk9fSU5WQUxJRF9BVFRSIHNldCwgYW5kIHNldHMgTkZTX0lOT19MQVlPVVRD
T01NSVQNCj4gMXN0IG5mc19nZXRhdHRyIC0+IHBuZnNfbGF5b3V0Y29tbWl0X2lub2RlIHN0YXJ0
cywgY2xlYXJzIGxheW91dGNvbW1pdCBmbGFnIHNldHMgTkZTX0lOT19MQVlPVVRDT01NSVRJTkcN
Cj4gbmZzX3dyaXRlYmFja191cGRhdGVfaW5vZGUgc2V0cyBzaXplIDgxOTIgdy8gTkZTX0lOT19J
TlZBTElEX0FUVFIgc2V0LCBhbmQgc2V0cyBORlNfSU5PX0xBWU9VVENPTU1JVA0KPiAxc3QgbmZz
X2dldGF0dHIgLT4gbmZzNF9sYXlvdXRjb21taXRfcmVsZWFzZSBzZXRzIHNpemUgNDA5NiwgTkZT
X0lOT19JTlZBTElEX0FUVFIgc2V0LCBjbGVhcnMgTkZTX0lOT19MQVlPVVRDT01NSVRUSU5HDQo+
IDFzdCBuZnNfZ2V0YXR0ciAtPiBfX3JldmFsaWRhdGVfaW5vZGUgc2V0cyBzaXplIDQwOTYsIE5G
U19JTk9fSU5WQUxJRF9BVFRSIG5vdCBzZXQuLiBjYWNoZSBpcyB2YWxpZA0KPiAybmQgbmZzX2dl
dGF0dHIgaW1tZWRpYXRlbHkgcmV0dXJucyA0MDk2IGV2ZW4gdGhvdWdoIE5GU19JTk9fTEFZT1VU
Q09NTUlUDQoNCklzIHRoaXMgYmVpbmcgdGVzdGVkIG9uIHRvcCBvZiB0aGUgY3VycmVudCBsaW51
eC1uZXh0L3Rlc3Rpbmc/IE5vcm1hbGx5LCBJ4oCZZCBleHBlY3QgaHR0cDovL2dpdC5saW51eC1u
ZnMub3JnLz9wPXRyb25kbXkvbGludXgtbmZzLmdpdDthPWNvbW1pdGRpZmY7aD0xMGI3ZTlhZDQ0
ODgxZmNkNDZhYzI0ZWI3Mzc0Mzc3YzZlODk2MmVkIHRvIGNhdXNlIDFzdCBuZnNfZ2V0YXR0cigp
IHRvIG5vdCBkZWNsYXJlIHRoZSBjYWNoZSB2YWxpZC4NCg0KQ2hlZXJzDQogIFRyb25k

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-27 18:05                                                                                                       ` Trond Myklebust
@ 2016-07-28  9:47                                                                                                         ` Benjamin Coddington
  2016-07-28 12:31                                                                                                           ` Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Benjamin Coddington @ 2016-07-28  9:47 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: hch, List Linux NFS Mailing


On 27 Jul 2016, at 14:05, Trond Myklebust wrote:

>> On Jul 27, 2016, at 12:14, Benjamin Coddington <bcodding@redhat.com> 
>> wrote:
>>
>> On 27 Jul 2016, at 8:31, Trond Myklebust wrote:
>>
>>>> On Jul 27, 2016, at 08:15, Trond Myklebust 
>>>> <trondmy@primarydata.com> wrote:
>>>>
>>>>
>>>>> On Jul 27, 2016, at 07:55, Benjamin Coddington 
>>>>> <bcodding@redhat.com> wrote:
>>>>>
>>>>> After adding more debugging, I see that all of that is working 
>>>>> correctly,
>>>>> but the first LAYOUTCOMMIT is taking the size back down to 4096 
>>>>> from the
>>>>> last nfs_writeback_done(), and the second LAYOUTCOMMIT never 
>>>>> brings it back
>>>>> up again.
>>>>>
>>>>
>>>> Excellent! Thanks for debugging that.
>>>>
>>>>> Now I see that we should be marking the block extents as written 
>>>>> atomically with
>>>>> setting LAYOUTCOMMIT and nfsi->layout->plh_lwb, otherwise a 
>>>>> LAYOUTCOMMIT can
>>>>> collect extents just added from the next bl_write_cleanup().  
>>>>> Then, the next
>>>>> LAYOUTCOMMIT fails, and all we're left with is the size from the 
>>>>> first
>>>>> LAYOUTCOMMIT.  Not sure if that particular problem is the whole 
>>>>> fix, but
>>>>> that's something to work on.
>>>>>
>>>>> I see ways to fix that:
>>>>>
>>>>>  - make a new pnfs_set_layoutcommit_locked() that can be used to 
>>>>> call
>>>>>    ext_tree_mark_written() inside the i_lock
>>>>>
>>>>>  - make another pnfs_layoutdriver_type operation to be used within
>>>>>    pnfs_set_layoutcommit (mark_layoutcommit? set_layoutcommit?), 
>>>>> and call
>>>>>    ext_tree_mark_written() within that..
>>>>>
>>>>>  - have .prepare_layoutcommit return a new positive plh_lwb that 
>>>>> would
>>>>>    extend the current LAYOUTCOMMIT
>>>>>
>>>>>  - make ext_tree_prepare_commit only encode up to plh_lwb
>>>>
>>>> I see no reason why ext_tree_prepare_commit() shouldn’t be 
>>>> allowed to extend the args->lastbytewritten. This is a metadata 
>>>> operation that is owned by the pNFS layout driver.
>>>> The only thing I’d note is you should then rewrite the failure 
>>>> case in pnfs_layoutcommit_inode() so that it doesn’t rely on the 
>>>> saved “end_pos”, but uses args->lastbytewritten instead (with a 
>>>> comment to the effect why)…
>>>
>>> In fact, given the potential for races here, I think the right thing 
>>> to do is to have ext_tree_prepare_commit() always set the correct 
>>> value for args->lastbytewritten.
>>
>> OK, that has cleared up that common failure case that was getting in 
>> the
>> way, but now it can still fail like this:
>>
>
> Good progress! :-)
>
>> nfs_writeback_update_inode sets size 4096 w/ NFS_INO_INVALID_ATTR 
>> set, and sets NFS_INO_LAYOUTCOMMIT
>> 1st nfs_getattr -> pnfs_layoutcommit_inode starts, clears 
>> layoutcommit flag sets NFS_INO_LAYOUTCOMMITING
>> nfs_writeback_update_inode sets size 8192 w/ NFS_INO_INVALID_ATTR 
>> set, and sets NFS_INO_LAYOUTCOMMIT
>> 1st nfs_getattr -> nfs4_layoutcommit_release sets size 4096, 
>> NFS_INO_INVALID_ATTR set, clears NFS_INO_LAYOUTCOMMITTING
>> 1st nfs_getattr -> __revalidate_inode sets size 4096, 
>> NFS_INO_INVALID_ATTR not set.. cache is valid
>> 2nd nfs_getattr immediately returns 4096 even though 
>> NFS_INO_LAYOUTCOMMIT
>
> Is this being tested on top of the current linux-next/testing? 
> Normally, I’d expect 
> http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=10b7e9ad44881fcd46ac24eb7374377c6e8962ed 
> to cause 1st nfs_getattr() to not declare the cache valid.

Yes, this is on your linux-next branch.

When the 1st nfs_getattr() goes through nfs_update_inode() the second 
time
(during __revalidate_inode), NFS_INO_INVALID_ATTR is never set by 
anything,
since all the attributes returned match the cache.  So even though
NFS_INO_LAYOUTCOMMIT is set, and the cache_validity variable is "false",
the NFS_INO_INVALID_ATTR is never set in the "invalid" local variable.

Should pnfs_layoutcommit_outstanding() always set NFS_INO_INVALID_ATTR?

Ben

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-28  9:47                                                                                                         ` Benjamin Coddington
@ 2016-07-28 12:31                                                                                                           ` Trond Myklebust
  2016-07-28 14:04                                                                                                             ` Trond Myklebust
  2016-07-28 15:33                                                                                                             ` Benjamin Coddington
  0 siblings, 2 replies; 69+ messages in thread
From: Trond Myklebust @ 2016-07-28 12:31 UTC (permalink / raw)
  To: Coddington Benjamin; +Cc: hch, List Linux NFS Mailing

DQo+IE9uIEp1bCAyOCwgMjAxNiwgYXQgMDU6NDcsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2Rk
aW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPiANCj4gDQo+IE9uIDI3IEp1bCAyMDE2LCBhdCAxNDow
NSwgVHJvbmQgTXlrbGVidXN0IHdyb3RlOg0KPiANCj4+PiBPbiBKdWwgMjcsIDIwMTYsIGF0IDEy
OjE0LCBCZW5qYW1pbiBDb2RkaW5ndG9uIDxiY29kZGluZ0ByZWRoYXQuY29tPiB3cm90ZToNCj4+
PiANCj4+PiBPbiAyNyBKdWwgMjAxNiwgYXQgODozMSwgVHJvbmQgTXlrbGVidXN0IHdyb3RlOg0K
Pj4+IA0KPj4+Pj4gT24gSnVsIDI3LCAyMDE2LCBhdCAwODoxNSwgVHJvbmQgTXlrbGVidXN0IDx0
cm9uZG15QHByaW1hcnlkYXRhLmNvbT4gd3JvdGU6DQo+Pj4+PiANCj4+Pj4+IA0KPj4+Pj4+IE9u
IEp1bCAyNywgMjAxNiwgYXQgMDc6NTUsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2RkaW5nQHJl
ZGhhdC5jb20+IHdyb3RlOg0KPj4+Pj4+IA0KPj4+Pj4+IEFmdGVyIGFkZGluZyBtb3JlIGRlYnVn
Z2luZywgSSBzZWUgdGhhdCBhbGwgb2YgdGhhdCBpcyB3b3JraW5nIGNvcnJlY3RseSwNCj4+Pj4+
PiBidXQgdGhlIGZpcnN0IExBWU9VVENPTU1JVCBpcyB0YWtpbmcgdGhlIHNpemUgYmFjayBkb3du
IHRvIDQwOTYgZnJvbSB0aGUNCj4+Pj4+PiBsYXN0IG5mc193cml0ZWJhY2tfZG9uZSgpLCBhbmQg
dGhlIHNlY29uZCBMQVlPVVRDT01NSVQgbmV2ZXIgYnJpbmdzIGl0IGJhY2sNCj4+Pj4+PiB1cCBh
Z2Fpbi4NCj4+Pj4+PiANCj4+Pj4+IA0KPj4+Pj4gRXhjZWxsZW50ISBUaGFua3MgZm9yIGRlYnVn
Z2luZyB0aGF0Lg0KPj4+Pj4gDQo+Pj4+Pj4gTm93IEkgc2VlIHRoYXQgd2Ugc2hvdWxkIGJlIG1h
cmtpbmcgdGhlIGJsb2NrIGV4dGVudHMgYXMgd3JpdHRlbiBhdG9taWNhbGx5IHdpdGgNCj4+Pj4+
PiBzZXR0aW5nIExBWU9VVENPTU1JVCBhbmQgbmZzaS0+bGF5b3V0LT5wbGhfbHdiLCBvdGhlcndp
c2UgYSBMQVlPVVRDT01NSVQgY2FuDQo+Pj4+Pj4gY29sbGVjdCBleHRlbnRzIGp1c3QgYWRkZWQg
ZnJvbSB0aGUgbmV4dCBibF93cml0ZV9jbGVhbnVwKCkuICBUaGVuLCB0aGUgbmV4dA0KPj4+Pj4+
IExBWU9VVENPTU1JVCBmYWlscywgYW5kIGFsbCB3ZSdyZSBsZWZ0IHdpdGggaXMgdGhlIHNpemUg
ZnJvbSB0aGUgZmlyc3QNCj4+Pj4+PiBMQVlPVVRDT01NSVQuICBOb3Qgc3VyZSBpZiB0aGF0IHBh
cnRpY3VsYXIgcHJvYmxlbSBpcyB0aGUgd2hvbGUgZml4LCBidXQNCj4+Pj4+PiB0aGF0J3Mgc29t
ZXRoaW5nIHRvIHdvcmsgb24uDQo+Pj4+Pj4gDQo+Pj4+Pj4gSSBzZWUgd2F5cyB0byBmaXggdGhh
dDoNCj4+Pj4+PiANCj4+Pj4+PiAtIG1ha2UgYSBuZXcgcG5mc19zZXRfbGF5b3V0Y29tbWl0X2xv
Y2tlZCgpIHRoYXQgY2FuIGJlIHVzZWQgdG8gY2FsbA0KPj4+Pj4+ICAgZXh0X3RyZWVfbWFya193
cml0dGVuKCkgaW5zaWRlIHRoZSBpX2xvY2sNCj4+Pj4+PiANCj4+Pj4+PiAtIG1ha2UgYW5vdGhl
ciBwbmZzX2xheW91dGRyaXZlcl90eXBlIG9wZXJhdGlvbiB0byBiZSB1c2VkIHdpdGhpbg0KPj4+
Pj4+ICAgcG5mc19zZXRfbGF5b3V0Y29tbWl0IChtYXJrX2xheW91dGNvbW1pdD8gc2V0X2xheW91
dGNvbW1pdD8pLCBhbmQgY2FsbA0KPj4+Pj4+ICAgZXh0X3RyZWVfbWFya193cml0dGVuKCkgd2l0
aGluIHRoYXQuLg0KPj4+Pj4+IA0KPj4+Pj4+IC0gaGF2ZSAucHJlcGFyZV9sYXlvdXRjb21taXQg
cmV0dXJuIGEgbmV3IHBvc2l0aXZlIHBsaF9sd2IgdGhhdCB3b3VsZA0KPj4+Pj4+ICAgZXh0ZW5k
IHRoZSBjdXJyZW50IExBWU9VVENPTU1JVA0KPj4+Pj4+IA0KPj4+Pj4+IC0gbWFrZSBleHRfdHJl
ZV9wcmVwYXJlX2NvbW1pdCBvbmx5IGVuY29kZSB1cCB0byBwbGhfbHdiDQo+Pj4+PiANCj4+Pj4+
IEkgc2VlIG5vIHJlYXNvbiB3aHkgZXh0X3RyZWVfcHJlcGFyZV9jb21taXQoKSBzaG91bGRu4oCZ
dCBiZSBhbGxvd2VkIHRvIGV4dGVuZCB0aGUgYXJncy0+bGFzdGJ5dGV3cml0dGVuLiBUaGlzIGlz
IGEgbWV0YWRhdGEgb3BlcmF0aW9uIHRoYXQgaXMgb3duZWQgYnkgdGhlIHBORlMgbGF5b3V0IGRy
aXZlci4NCj4+Pj4+IFRoZSBvbmx5IHRoaW5nIEnigJlkIG5vdGUgaXMgeW91IHNob3VsZCB0aGVu
IHJld3JpdGUgdGhlIGZhaWx1cmUgY2FzZSBpbiBwbmZzX2xheW91dGNvbW1pdF9pbm9kZSgpIHNv
IHRoYXQgaXQgZG9lc27igJl0IHJlbHkgb24gdGhlIHNhdmVkIOKAnGVuZF9wb3PigJ0sIGJ1dCB1
c2VzIGFyZ3MtPmxhc3RieXRld3JpdHRlbiBpbnN0ZWFkICh3aXRoIGEgY29tbWVudCB0byB0aGUg
ZWZmZWN0IHdoeSnigKYNCj4+Pj4gDQo+Pj4+IEluIGZhY3QsIGdpdmVuIHRoZSBwb3RlbnRpYWwg
Zm9yIHJhY2VzIGhlcmUsIEkgdGhpbmsgdGhlIHJpZ2h0IHRoaW5nIHRvIGRvIGlzIHRvIGhhdmUg
ZXh0X3RyZWVfcHJlcGFyZV9jb21taXQoKSBhbHdheXMgc2V0IHRoZSBjb3JyZWN0IHZhbHVlIGZv
ciBhcmdzLT5sYXN0Ynl0ZXdyaXR0ZW4uDQo+Pj4gDQo+Pj4gT0ssIHRoYXQgaGFzIGNsZWFyZWQg
dXAgdGhhdCBjb21tb24gZmFpbHVyZSBjYXNlIHRoYXQgd2FzIGdldHRpbmcgaW4gdGhlDQo+Pj4g
d2F5LCBidXQgbm93IGl0IGNhbiBzdGlsbCBmYWlsIGxpa2UgdGhpczoNCj4+PiANCj4+IA0KPj4g
R29vZCBwcm9ncmVzcyEgOi0pDQo+PiANCj4+PiBuZnNfd3JpdGViYWNrX3VwZGF0ZV9pbm9kZSBz
ZXRzIHNpemUgNDA5NiB3LyBORlNfSU5PX0lOVkFMSURfQVRUUiBzZXQsIGFuZCBzZXRzIE5GU19J
Tk9fTEFZT1VUQ09NTUlUDQo+Pj4gMXN0IG5mc19nZXRhdHRyIC0+IHBuZnNfbGF5b3V0Y29tbWl0
X2lub2RlIHN0YXJ0cywgY2xlYXJzIGxheW91dGNvbW1pdCBmbGFnIHNldHMgTkZTX0lOT19MQVlP
VVRDT01NSVRJTkcNCj4+PiBuZnNfd3JpdGViYWNrX3VwZGF0ZV9pbm9kZSBzZXRzIHNpemUgODE5
MiB3LyBORlNfSU5PX0lOVkFMSURfQVRUUiBzZXQsIGFuZCBzZXRzIE5GU19JTk9fTEFZT1VUQ09N
TUlUDQo+Pj4gMXN0IG5mc19nZXRhdHRyIC0+IG5mczRfbGF5b3V0Y29tbWl0X3JlbGVhc2Ugc2V0
cyBzaXplIDQwOTYsIE5GU19JTk9fSU5WQUxJRF9BVFRSIHNldCwgY2xlYXJzIE5GU19JTk9fTEFZ
T1VUQ09NTUlUVElORw0KPj4+IDFzdCBuZnNfZ2V0YXR0ciAtPiBfX3JldmFsaWRhdGVfaW5vZGUg
c2V0cyBzaXplIDQwOTYsIE5GU19JTk9fSU5WQUxJRF9BVFRSIG5vdCBzZXQuLiBjYWNoZSBpcyB2
YWxpZA0KPj4+IDJuZCBuZnNfZ2V0YXR0ciBpbW1lZGlhdGVseSByZXR1cm5zIDQwOTYgZXZlbiB0
aG91Z2ggTkZTX0lOT19MQVlPVVRDT01NSVQNCj4+IA0KPj4gSXMgdGhpcyBiZWluZyB0ZXN0ZWQg
b24gdG9wIG9mIHRoZSBjdXJyZW50IGxpbnV4LW5leHQvdGVzdGluZz8gTm9ybWFsbHksIEnigJlk
IGV4cGVjdCBodHRwOi8vZ2l0LmxpbnV4LW5mcy5vcmcvP3A9dHJvbmRteS9saW51eC1uZnMuZ2l0
O2E9Y29tbWl0ZGlmZjtoPTEwYjdlOWFkNDQ4ODFmY2Q0NmFjMjRlYjczNzQzNzdjNmU4OTYyZWQg
dG8gY2F1c2UgMXN0IG5mc19nZXRhdHRyKCkgdG8gbm90IGRlY2xhcmUgdGhlIGNhY2hlIHZhbGlk
Lg0KPiANCj4gWWVzLCB0aGlzIGlzIG9uIHlvdXIgbGludXgtbmV4dCBicmFuY2guDQo+IA0KPiBX
aGVuIHRoZSAxc3QgbmZzX2dldGF0dHIoKSBnb2VzIHRocm91Z2ggbmZzX3VwZGF0ZV9pbm9kZSgp
IHRoZSBzZWNvbmQgdGltZQ0KPiAoZHVyaW5nIF9fcmV2YWxpZGF0ZV9pbm9kZSksIE5GU19JTk9f
SU5WQUxJRF9BVFRSIGlzIG5ldmVyIHNldCBieSBhbnl0aGluZywNCj4gc2luY2UgYWxsIHRoZSBh
dHRyaWJ1dGVzIHJldHVybmVkIG1hdGNoIHRoZSBjYWNoZS4gIFNvIGV2ZW4gdGhvdWdoDQo+IE5G
U19JTk9fTEFZT1VUQ09NTUlUIGlzIHNldCwgYW5kIHRoZSBjYWNoZV92YWxpZGl0eSB2YXJpYWJs
ZSBpcyAiZmFsc2UiLA0KPiB0aGUgTkZTX0lOT19JTlZBTElEX0FUVFIgaXMgbmV2ZXIgc2V0IGlu
IHRoZSAiaW52YWxpZCIgbG9jYWwgdmFyaWFibGUuDQo+IA0KPiBTaG91bGQgcG5mc19sYXlvdXRj
b21taXRfb3V0c3RhbmRpbmcoKSBhbHdheXMgc2V0IE5GU19JTk9fSU5WQUxJRF9BVFRSPw0KPiAN
Cj4gQmVuDQoNCm5mc19wb3N0X29wX3VwZGF0ZV9pbm9kZV9sb2NrZWQoKSBzaG91bGQgYmUgZG9p
bmcgdGhhdCBhcyBwYXJ0IG9mIHRoZSBjYWxsY2hhaW4gaW4gbmZzX3dyaXRlYmFja191cGRhdGVf
aW5vZGUoKS4NCg0KDQo=

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-28 12:31                                                                                                           ` Trond Myklebust
@ 2016-07-28 14:04                                                                                                             ` Trond Myklebust
  2016-07-28 15:38                                                                                                               ` Benjamin Coddington
  2016-07-28 15:33                                                                                                             ` Benjamin Coddington
  1 sibling, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-28 14:04 UTC (permalink / raw)
  To: Coddington Benjamin; +Cc: hch, List Linux NFS Mailing

DQo+IE9uIEp1bCAyOCwgMjAxNiwgYXQgMDg6MzEsIFRyb25kIE15a2xlYnVzdCA8dHJvbmRteUBw
cmltYXJ5ZGF0YS5jb20+IHdyb3RlOg0KPiANCj4gDQo+PiBPbiBKdWwgMjgsIDIwMTYsIGF0IDA1
OjQ3LCBCZW5qYW1pbiBDb2RkaW5ndG9uIDxiY29kZGluZ0ByZWRoYXQuY29tPiB3cm90ZToNCj4+
IA0KPj4gDQo+PiBPbiAyNyBKdWwgMjAxNiwgYXQgMTQ6MDUsIFRyb25kIE15a2xlYnVzdCB3cm90
ZToNCj4+IA0KPj4+PiBPbiBKdWwgMjcsIDIwMTYsIGF0IDEyOjE0LCBCZW5qYW1pbiBDb2RkaW5n
dG9uIDxiY29kZGluZ0ByZWRoYXQuY29tPiB3cm90ZToNCj4+Pj4gDQo+Pj4+IE9uIDI3IEp1bCAy
MDE2LCBhdCA4OjMxLCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6DQo+Pj4+IA0KPj4+Pj4+IE9uIEp1
bCAyNywgMjAxNiwgYXQgMDg6MTUsIFRyb25kIE15a2xlYnVzdCA8dHJvbmRteUBwcmltYXJ5ZGF0
YS5jb20+IHdyb3RlOg0KPj4+Pj4+IA0KPj4+Pj4+IA0KPj4+Pj4+PiBPbiBKdWwgMjcsIDIwMTYs
IGF0IDA3OjU1LCBCZW5qYW1pbiBDb2RkaW5ndG9uIDxiY29kZGluZ0ByZWRoYXQuY29tPiB3cm90
ZToNCj4+Pj4+Pj4gDQo+Pj4+Pj4+IEFmdGVyIGFkZGluZyBtb3JlIGRlYnVnZ2luZywgSSBzZWUg
dGhhdCBhbGwgb2YgdGhhdCBpcyB3b3JraW5nIGNvcnJlY3RseSwNCj4+Pj4+Pj4gYnV0IHRoZSBm
aXJzdCBMQVlPVVRDT01NSVQgaXMgdGFraW5nIHRoZSBzaXplIGJhY2sgZG93biB0byA0MDk2IGZy
b20gdGhlDQo+Pj4+Pj4+IGxhc3QgbmZzX3dyaXRlYmFja19kb25lKCksIGFuZCB0aGUgc2Vjb25k
IExBWU9VVENPTU1JVCBuZXZlciBicmluZ3MgaXQgYmFjaw0KPj4+Pj4+PiB1cCBhZ2Fpbi4NCj4+
Pj4+Pj4gDQo+Pj4+Pj4gDQo+Pj4+Pj4gRXhjZWxsZW50ISBUaGFua3MgZm9yIGRlYnVnZ2luZyB0
aGF0Lg0KPj4+Pj4+IA0KPj4+Pj4+PiBOb3cgSSBzZWUgdGhhdCB3ZSBzaG91bGQgYmUgbWFya2lu
ZyB0aGUgYmxvY2sgZXh0ZW50cyBhcyB3cml0dGVuIGF0b21pY2FsbHkgd2l0aA0KPj4+Pj4+PiBz
ZXR0aW5nIExBWU9VVENPTU1JVCBhbmQgbmZzaS0+bGF5b3V0LT5wbGhfbHdiLCBvdGhlcndpc2Ug
YSBMQVlPVVRDT01NSVQgY2FuDQo+Pj4+Pj4+IGNvbGxlY3QgZXh0ZW50cyBqdXN0IGFkZGVkIGZy
b20gdGhlIG5leHQgYmxfd3JpdGVfY2xlYW51cCgpLiAgVGhlbiwgdGhlIG5leHQNCj4+Pj4+Pj4g
TEFZT1VUQ09NTUlUIGZhaWxzLCBhbmQgYWxsIHdlJ3JlIGxlZnQgd2l0aCBpcyB0aGUgc2l6ZSBm
cm9tIHRoZSBmaXJzdA0KPj4+Pj4+PiBMQVlPVVRDT01NSVQuICBOb3Qgc3VyZSBpZiB0aGF0IHBh
cnRpY3VsYXIgcHJvYmxlbSBpcyB0aGUgd2hvbGUgZml4LCBidXQNCj4+Pj4+Pj4gdGhhdCdzIHNv
bWV0aGluZyB0byB3b3JrIG9uLg0KPj4+Pj4+PiANCj4+Pj4+Pj4gSSBzZWUgd2F5cyB0byBmaXgg
dGhhdDoNCj4+Pj4+Pj4gDQo+Pj4+Pj4+IC0gbWFrZSBhIG5ldyBwbmZzX3NldF9sYXlvdXRjb21t
aXRfbG9ja2VkKCkgdGhhdCBjYW4gYmUgdXNlZCB0byBjYWxsDQo+Pj4+Pj4+ICBleHRfdHJlZV9t
YXJrX3dyaXR0ZW4oKSBpbnNpZGUgdGhlIGlfbG9jaw0KPj4+Pj4+PiANCj4+Pj4+Pj4gLSBtYWtl
IGFub3RoZXIgcG5mc19sYXlvdXRkcml2ZXJfdHlwZSBvcGVyYXRpb24gdG8gYmUgdXNlZCB3aXRo
aW4NCj4+Pj4+Pj4gIHBuZnNfc2V0X2xheW91dGNvbW1pdCAobWFya19sYXlvdXRjb21taXQ/IHNl
dF9sYXlvdXRjb21taXQ/KSwgYW5kIGNhbGwNCj4+Pj4+Pj4gIGV4dF90cmVlX21hcmtfd3JpdHRl
bigpIHdpdGhpbiB0aGF0Li4NCj4+Pj4+Pj4gDQo+Pj4+Pj4+IC0gaGF2ZSAucHJlcGFyZV9sYXlv
dXRjb21taXQgcmV0dXJuIGEgbmV3IHBvc2l0aXZlIHBsaF9sd2IgdGhhdCB3b3VsZA0KPj4+Pj4+
PiAgZXh0ZW5kIHRoZSBjdXJyZW50IExBWU9VVENPTU1JVA0KPj4+Pj4+PiANCj4+Pj4+Pj4gLSBt
YWtlIGV4dF90cmVlX3ByZXBhcmVfY29tbWl0IG9ubHkgZW5jb2RlIHVwIHRvIHBsaF9sd2INCj4+
Pj4+PiANCj4+Pj4+PiBJIHNlZSBubyByZWFzb24gd2h5IGV4dF90cmVlX3ByZXBhcmVfY29tbWl0
KCkgc2hvdWxkbuKAmXQgYmUgYWxsb3dlZCB0byBleHRlbmQgdGhlIGFyZ3MtPmxhc3RieXRld3Jp
dHRlbi4gVGhpcyBpcyBhIG1ldGFkYXRhIG9wZXJhdGlvbiB0aGF0IGlzIG93bmVkIGJ5IHRoZSBw
TkZTIGxheW91dCBkcml2ZXIuDQo+Pj4+Pj4gVGhlIG9ubHkgdGhpbmcgSeKAmWQgbm90ZSBpcyB5
b3Ugc2hvdWxkIHRoZW4gcmV3cml0ZSB0aGUgZmFpbHVyZSBjYXNlIGluIHBuZnNfbGF5b3V0Y29t
bWl0X2lub2RlKCkgc28gdGhhdCBpdCBkb2VzbuKAmXQgcmVseSBvbiB0aGUgc2F2ZWQg4oCcZW5k
X3Bvc+KAnSwgYnV0IHVzZXMgYXJncy0+bGFzdGJ5dGV3cml0dGVuIGluc3RlYWQgKHdpdGggYSBj
b21tZW50IHRvIHRoZSBlZmZlY3Qgd2h5KeKApg0KPj4+Pj4gDQo+Pj4+PiBJbiBmYWN0LCBnaXZl
biB0aGUgcG90ZW50aWFsIGZvciByYWNlcyBoZXJlLCBJIHRoaW5rIHRoZSByaWdodCB0aGluZyB0
byBkbyBpcyB0byBoYXZlIGV4dF90cmVlX3ByZXBhcmVfY29tbWl0KCkgYWx3YXlzIHNldCB0aGUg
Y29ycmVjdCB2YWx1ZSBmb3IgYXJncy0+bGFzdGJ5dGV3cml0dGVuLg0KPj4+PiANCj4+Pj4gT0ss
IHRoYXQgaGFzIGNsZWFyZWQgdXAgdGhhdCBjb21tb24gZmFpbHVyZSBjYXNlIHRoYXQgd2FzIGdl
dHRpbmcgaW4gdGhlDQo+Pj4+IHdheSwgYnV0IG5vdyBpdCBjYW4gc3RpbGwgZmFpbCBsaWtlIHRo
aXM6DQo+Pj4+IA0KPj4+IA0KPj4+IEdvb2QgcHJvZ3Jlc3MhIDotKQ0KPj4+IA0KPj4+PiBuZnNf
d3JpdGViYWNrX3VwZGF0ZV9pbm9kZSBzZXRzIHNpemUgNDA5NiB3LyBORlNfSU5PX0lOVkFMSURf
QVRUUiBzZXQsIGFuZCBzZXRzIE5GU19JTk9fTEFZT1VUQ09NTUlUDQo+Pj4+IDFzdCBuZnNfZ2V0
YXR0ciAtPiBwbmZzX2xheW91dGNvbW1pdF9pbm9kZSBzdGFydHMsIGNsZWFycyBsYXlvdXRjb21t
aXQgZmxhZyBzZXRzIE5GU19JTk9fTEFZT1VUQ09NTUlUSU5HDQo+Pj4+IG5mc193cml0ZWJhY2tf
dXBkYXRlX2lub2RlIHNldHMgc2l6ZSA4MTkyIHcvIE5GU19JTk9fSU5WQUxJRF9BVFRSIHNldCwg
YW5kIHNldHMgTkZTX0lOT19MQVlPVVRDT01NSVQNCj4+Pj4gMXN0IG5mc19nZXRhdHRyIC0+IG5m
czRfbGF5b3V0Y29tbWl0X3JlbGVhc2Ugc2V0cyBzaXplIDQwOTYsIE5GU19JTk9fSU5WQUxJRF9B
VFRSIHNldCwgY2xlYXJzIE5GU19JTk9fTEFZT1VUQ09NTUlUVElORw0KPj4+PiAxc3QgbmZzX2dl
dGF0dHIgLT4gX19yZXZhbGlkYXRlX2lub2RlIHNldHMgc2l6ZSA0MDk2LCBORlNfSU5PX0lOVkFM
SURfQVRUUiBub3Qgc2V0Li4gY2FjaGUgaXMgdmFsaWQNCj4+Pj4gMm5kIG5mc19nZXRhdHRyIGlt
bWVkaWF0ZWx5IHJldHVybnMgNDA5NiBldmVuIHRob3VnaCBORlNfSU5PX0xBWU9VVENPTU1JVA0K
Pj4+IA0KPj4+IElzIHRoaXMgYmVpbmcgdGVzdGVkIG9uIHRvcCBvZiB0aGUgY3VycmVudCBsaW51
eC1uZXh0L3Rlc3Rpbmc/IE5vcm1hbGx5LCBJ4oCZZCBleHBlY3QgaHR0cDovL2dpdC5saW51eC1u
ZnMub3JnLz9wPXRyb25kbXkvbGludXgtbmZzLmdpdDthPWNvbW1pdGRpZmY7aD0xMGI3ZTlhZDQ0
ODgxZmNkNDZhYzI0ZWI3Mzc0Mzc3YzZlODk2MmVkIHRvIGNhdXNlIDFzdCBuZnNfZ2V0YXR0cigp
IHRvIG5vdCBkZWNsYXJlIHRoZSBjYWNoZSB2YWxpZC4NCj4+IA0KPj4gWWVzLCB0aGlzIGlzIG9u
IHlvdXIgbGludXgtbmV4dCBicmFuY2guDQo+PiANCj4+IFdoZW4gdGhlIDFzdCBuZnNfZ2V0YXR0
cigpIGdvZXMgdGhyb3VnaCBuZnNfdXBkYXRlX2lub2RlKCkgdGhlIHNlY29uZCB0aW1lDQo+PiAo
ZHVyaW5nIF9fcmV2YWxpZGF0ZV9pbm9kZSksIE5GU19JTk9fSU5WQUxJRF9BVFRSIGlzIG5ldmVy
IHNldCBieSBhbnl0aGluZywNCj4+IHNpbmNlIGFsbCB0aGUgYXR0cmlidXRlcyByZXR1cm5lZCBt
YXRjaCB0aGUgY2FjaGUuICBTbyBldmVuIHRob3VnaA0KPj4gTkZTX0lOT19MQVlPVVRDT01NSVQg
aXMgc2V0LCBhbmQgdGhlIGNhY2hlX3ZhbGlkaXR5IHZhcmlhYmxlIGlzICJmYWxzZSIsDQo+PiB0
aGUgTkZTX0lOT19JTlZBTElEX0FUVFIgaXMgbmV2ZXIgc2V0IGluIHRoZSAiaW52YWxpZCIgbG9j
YWwgdmFyaWFibGUuDQo+PiANCj4+IFNob3VsZCBwbmZzX2xheW91dGNvbW1pdF9vdXRzdGFuZGlu
ZygpIGFsd2F5cyBzZXQgTkZTX0lOT19JTlZBTElEX0FUVFI/DQo+PiANCj4+IEJlbg0KPiANCj4g
bmZzX3Bvc3Rfb3BfdXBkYXRlX2lub2RlX2xvY2tlZCgpIHNob3VsZCBiZSBkb2luZyB0aGF0IGFz
IHBhcnQgb2YgdGhlIGNhbGxjaGFpbiBpbiBuZnNfd3JpdGViYWNrX3VwZGF0ZV9pbm9kZSgpLg0K
PiANCg0KQnkgdGhlIHdheS4gSSBqdXN0IG5vdGljZWQgdGhhdCBub3RoaW5nIGFwcGVhcnMgdG8g
YmUgdXNpbmcgdGhlIGF0dHJpYnV0ZXMgd2UgcmV0cmlldmUgYXMgcGFydCBvZiB0aGUgbGF5b3V0
Y29tbWl0IGNhbGwuIERvZXMgYWRkaW5nIGEgbmZzX3JlZnJlc2hfaW5vZGUoKSB0byB0aGUg4oCc
c3VjY2Vzc+KAnSBwYXRoIGluIG5mczRfbGF5b3V0Y29tbWl0X2RvbmUoKSBwZXJoYXBzIGhlbHA/
DQoNCg0K

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-28 12:31                                                                                                           ` Trond Myklebust
  2016-07-28 14:04                                                                                                             ` Trond Myklebust
@ 2016-07-28 15:33                                                                                                             ` Benjamin Coddington
  2016-07-28 15:36                                                                                                               ` Trond Myklebust
  1 sibling, 1 reply; 69+ messages in thread
From: Benjamin Coddington @ 2016-07-28 15:33 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: hch, List Linux NFS Mailing

On 28 Jul 2016, at 8:31, Trond Myklebust wrote:

>> On Jul 28, 2016, at 05:47, Benjamin Coddington <bcodding@redhat.com> 
>> wrote:
>>
>>
>> On 27 Jul 2016, at 14:05, Trond Myklebust wrote:
>>
>>>> On Jul 27, 2016, at 12:14, Benjamin Coddington 
>>>> <bcodding@redhat.com> wrote:
>>>>
>>>> On 27 Jul 2016, at 8:31, Trond Myklebust wrote:
>>>>
>>>>>> On Jul 27, 2016, at 08:15, Trond Myklebust 
>>>>>> <trondmy@primarydata.com> wrote:
>>>>>>
>>>>>>
>>>>>>> On Jul 27, 2016, at 07:55, Benjamin Coddington 
>>>>>>> <bcodding@redhat.com> wrote:
>>>>>>>
>>>>>>> After adding more debugging, I see that all of that is working 
>>>>>>> correctly,
>>>>>>> but the first LAYOUTCOMMIT is taking the size back down to 4096 
>>>>>>> from the
>>>>>>> last nfs_writeback_done(), and the second LAYOUTCOMMIT never 
>>>>>>> brings it back
>>>>>>> up again.
>>>>>>>
>>>>>>
>>>>>> Excellent! Thanks for debugging that.
>>>>>>
>>>>>>> Now I see that we should be marking the block extents as written 
>>>>>>> atomically with
>>>>>>> setting LAYOUTCOMMIT and nfsi->layout->plh_lwb, otherwise a 
>>>>>>> LAYOUTCOMMIT can
>>>>>>> collect extents just added from the next bl_write_cleanup().  
>>>>>>> Then, the next
>>>>>>> LAYOUTCOMMIT fails, and all we're left with is the size from the 
>>>>>>> first
>>>>>>> LAYOUTCOMMIT.  Not sure if that particular problem is the whole 
>>>>>>> fix, but
>>>>>>> that's something to work on.
>>>>>>>
>>>>>>> I see ways to fix that:
>>>>>>>
>>>>>>> - make a new pnfs_set_layoutcommit_locked() that can be used to 
>>>>>>> call
>>>>>>>   ext_tree_mark_written() inside the i_lock
>>>>>>>
>>>>>>> - make another pnfs_layoutdriver_type operation to be used 
>>>>>>> within
>>>>>>>   pnfs_set_layoutcommit (mark_layoutcommit? set_layoutcommit?), 
>>>>>>> and call
>>>>>>>   ext_tree_mark_written() within that..
>>>>>>>
>>>>>>> - have .prepare_layoutcommit return a new positive plh_lwb that 
>>>>>>> would
>>>>>>>   extend the current LAYOUTCOMMIT
>>>>>>>
>>>>>>> - make ext_tree_prepare_commit only encode up to plh_lwb
>>>>>>
>>>>>> I see no reason why ext_tree_prepare_commit() shouldn’t be 
>>>>>> allowed to extend the args->lastbytewritten. This is a metadata 
>>>>>> operation that is owned by the pNFS layout driver.
>>>>>> The only thing I’d note is you should then rewrite the failure 
>>>>>> case in pnfs_layoutcommit_inode() so that it doesn’t rely on 
>>>>>> the saved “end_pos”, but uses args->lastbytewritten instead 
>>>>>> (with a comment to the effect why)…
>>>>>
>>>>> In fact, given the potential for races here, I think the right 
>>>>> thing to do is to have ext_tree_prepare_commit() always set the 
>>>>> correct value for args->lastbytewritten.
>>>>
>>>> OK, that has cleared up that common failure case that was getting 
>>>> in the
>>>> way, but now it can still fail like this:
>>>>
>>>
>>> Good progress! :-)
>>>
>>>> nfs_writeback_update_inode sets size 4096 w/ NFS_INO_INVALID_ATTR 
>>>> set, and sets NFS_INO_LAYOUTCOMMIT
>>>> 1st nfs_getattr -> pnfs_layoutcommit_inode starts, clears 
>>>> layoutcommit flag sets NFS_INO_LAYOUTCOMMITING
>>>> nfs_writeback_update_inode sets size 8192 w/ NFS_INO_INVALID_ATTR 
>>>> set, and sets NFS_INO_LAYOUTCOMMIT
>>>> 1st nfs_getattr -> nfs4_layoutcommit_release sets size 4096, 
>>>> NFS_INO_INVALID_ATTR set, clears NFS_INO_LAYOUTCOMMITTING
>>>> 1st nfs_getattr -> __revalidate_inode sets size 4096, 
>>>> NFS_INO_INVALID_ATTR not set.. cache is valid
>>>> 2nd nfs_getattr immediately returns 4096 even though 
>>>> NFS_INO_LAYOUTCOMMIT
>>>
>>> Is this being tested on top of the current linux-next/testing? 
>>> Normally, I’d expect 
>>> http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=10b7e9ad44881fcd46ac24eb7374377c6e8962ed 
>>> to cause 1st nfs_getattr() to not declare the cache valid.
>>
>> Yes, this is on your linux-next branch.
>>
>> When the 1st nfs_getattr() goes through nfs_update_inode() the second 
>> time
>> (during __revalidate_inode), NFS_INO_INVALID_ATTR is never set by 
>> anything,
>> since all the attributes returned match the cache.  So even though
>> NFS_INO_LAYOUTCOMMIT is set, and the cache_validity variable is 
>> "false",
>> the NFS_INO_INVALID_ATTR is never set in the "invalid" local 
>> variable.
>>
>> Should pnfs_layoutcommit_outstanding() always set 
>> NFS_INO_INVALID_ATTR?
>>
>> Ben
>
> nfs_post_op_update_inode_locked() should be doing that as part of the 
> callchain in nfs_writeback_update_inode().

And it is, and the bit persists through the next layoutcommit, it is the 
next GETATTR response that finds that all the attributes are the same 
and the bit is cleared.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-28 15:33                                                                                                             ` Benjamin Coddington
@ 2016-07-28 15:36                                                                                                               ` Trond Myklebust
  2016-07-28 16:40                                                                                                                 ` Benjamin Coddington
  0 siblings, 1 reply; 69+ messages in thread
From: Trond Myklebust @ 2016-07-28 15:36 UTC (permalink / raw)
  To: Coddington Benjamin; +Cc: hch, List Linux NFS Mailing

DQo+IE9uIEp1bCAyOCwgMjAxNiwgYXQgMTE6MzMsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2Rk
aW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPiANCj4gT24gMjggSnVsIDIwMTYsIGF0IDg6MzEsIFRy
b25kIE15a2xlYnVzdCB3cm90ZToNCj4gDQo+Pj4gT24gSnVsIDI4LCAyMDE2LCBhdCAwNTo0Nywg
QmVuamFtaW4gQ29kZGluZ3RvbiA8YmNvZGRpbmdAcmVkaGF0LmNvbT4gd3JvdGU6DQo+Pj4gDQo+
Pj4gDQo+Pj4gT24gMjcgSnVsIDIwMTYsIGF0IDE0OjA1LCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6
DQo+Pj4gDQo+Pj4+PiBPbiBKdWwgMjcsIDIwMTYsIGF0IDEyOjE0LCBCZW5qYW1pbiBDb2RkaW5n
dG9uIDxiY29kZGluZ0ByZWRoYXQuY29tPiB3cm90ZToNCj4+Pj4+IA0KPj4+Pj4gT24gMjcgSnVs
IDIwMTYsIGF0IDg6MzEsIFRyb25kIE15a2xlYnVzdCB3cm90ZToNCj4+Pj4+IA0KPj4+Pj4+PiBP
biBKdWwgMjcsIDIwMTYsIGF0IDA4OjE1LCBUcm9uZCBNeWtsZWJ1c3QgPHRyb25kbXlAcHJpbWFy
eWRhdGEuY29tPiB3cm90ZToNCj4+Pj4+Pj4gDQo+Pj4+Pj4+IA0KPj4+Pj4+Pj4gT24gSnVsIDI3
LCAyMDE2LCBhdCAwNzo1NSwgQmVuamFtaW4gQ29kZGluZ3RvbiA8YmNvZGRpbmdAcmVkaGF0LmNv
bT4gd3JvdGU6DQo+Pj4+Pj4+PiANCj4+Pj4+Pj4+IEFmdGVyIGFkZGluZyBtb3JlIGRlYnVnZ2lu
ZywgSSBzZWUgdGhhdCBhbGwgb2YgdGhhdCBpcyB3b3JraW5nIGNvcnJlY3RseSwNCj4+Pj4+Pj4+
IGJ1dCB0aGUgZmlyc3QgTEFZT1VUQ09NTUlUIGlzIHRha2luZyB0aGUgc2l6ZSBiYWNrIGRvd24g
dG8gNDA5NiBmcm9tIHRoZQ0KPj4+Pj4+Pj4gbGFzdCBuZnNfd3JpdGViYWNrX2RvbmUoKSwgYW5k
IHRoZSBzZWNvbmQgTEFZT1VUQ09NTUlUIG5ldmVyIGJyaW5ncyBpdCBiYWNrDQo+Pj4+Pj4+PiB1
cCBhZ2Fpbi4NCj4+Pj4+Pj4+IA0KPj4+Pj4+PiANCj4+Pj4+Pj4gRXhjZWxsZW50ISBUaGFua3Mg
Zm9yIGRlYnVnZ2luZyB0aGF0Lg0KPj4+Pj4+PiANCj4+Pj4+Pj4+IE5vdyBJIHNlZSB0aGF0IHdl
IHNob3VsZCBiZSBtYXJraW5nIHRoZSBibG9jayBleHRlbnRzIGFzIHdyaXR0ZW4gYXRvbWljYWxs
eSB3aXRoDQo+Pj4+Pj4+PiBzZXR0aW5nIExBWU9VVENPTU1JVCBhbmQgbmZzaS0+bGF5b3V0LT5w
bGhfbHdiLCBvdGhlcndpc2UgYSBMQVlPVVRDT01NSVQgY2FuDQo+Pj4+Pj4+PiBjb2xsZWN0IGV4
dGVudHMganVzdCBhZGRlZCBmcm9tIHRoZSBuZXh0IGJsX3dyaXRlX2NsZWFudXAoKS4gIFRoZW4s
IHRoZSBuZXh0DQo+Pj4+Pj4+PiBMQVlPVVRDT01NSVQgZmFpbHMsIGFuZCBhbGwgd2UncmUgbGVm
dCB3aXRoIGlzIHRoZSBzaXplIGZyb20gdGhlIGZpcnN0DQo+Pj4+Pj4+PiBMQVlPVVRDT01NSVQu
ICBOb3Qgc3VyZSBpZiB0aGF0IHBhcnRpY3VsYXIgcHJvYmxlbSBpcyB0aGUgd2hvbGUgZml4LCBi
dXQNCj4+Pj4+Pj4+IHRoYXQncyBzb21ldGhpbmcgdG8gd29yayBvbi4NCj4+Pj4+Pj4+IA0KPj4+
Pj4+Pj4gSSBzZWUgd2F5cyB0byBmaXggdGhhdDoNCj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4gLSBtYWtl
IGEgbmV3IHBuZnNfc2V0X2xheW91dGNvbW1pdF9sb2NrZWQoKSB0aGF0IGNhbiBiZSB1c2VkIHRv
IGNhbGwNCj4+Pj4+Pj4+ICBleHRfdHJlZV9tYXJrX3dyaXR0ZW4oKSBpbnNpZGUgdGhlIGlfbG9j
aw0KPj4+Pj4+Pj4gDQo+Pj4+Pj4+PiAtIG1ha2UgYW5vdGhlciBwbmZzX2xheW91dGRyaXZlcl90
eXBlIG9wZXJhdGlvbiB0byBiZSB1c2VkIHdpdGhpbg0KPj4+Pj4+Pj4gIHBuZnNfc2V0X2xheW91
dGNvbW1pdCAobWFya19sYXlvdXRjb21taXQ/IHNldF9sYXlvdXRjb21taXQ/KSwgYW5kIGNhbGwN
Cj4+Pj4+Pj4+ICBleHRfdHJlZV9tYXJrX3dyaXR0ZW4oKSB3aXRoaW4gdGhhdC4uDQo+Pj4+Pj4+
PiANCj4+Pj4+Pj4+IC0gaGF2ZSAucHJlcGFyZV9sYXlvdXRjb21taXQgcmV0dXJuIGEgbmV3IHBv
c2l0aXZlIHBsaF9sd2IgdGhhdCB3b3VsZA0KPj4+Pj4+Pj4gIGV4dGVuZCB0aGUgY3VycmVudCBM
QVlPVVRDT01NSVQNCj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4gLSBtYWtlIGV4dF90cmVlX3ByZXBhcmVf
Y29tbWl0IG9ubHkgZW5jb2RlIHVwIHRvIHBsaF9sd2INCj4+Pj4+Pj4gDQo+Pj4+Pj4+IEkgc2Vl
IG5vIHJlYXNvbiB3aHkgZXh0X3RyZWVfcHJlcGFyZV9jb21taXQoKSBzaG91bGRu4oCZdCBiZSBh
bGxvd2VkIHRvIGV4dGVuZCB0aGUgYXJncy0+bGFzdGJ5dGV3cml0dGVuLiBUaGlzIGlzIGEgbWV0
YWRhdGEgb3BlcmF0aW9uIHRoYXQgaXMgb3duZWQgYnkgdGhlIHBORlMgbGF5b3V0IGRyaXZlci4N
Cj4+Pj4+Pj4gVGhlIG9ubHkgdGhpbmcgSeKAmWQgbm90ZSBpcyB5b3Ugc2hvdWxkIHRoZW4gcmV3
cml0ZSB0aGUgZmFpbHVyZSBjYXNlIGluIHBuZnNfbGF5b3V0Y29tbWl0X2lub2RlKCkgc28gdGhh
dCBpdCBkb2VzbuKAmXQgcmVseSBvbiB0aGUgc2F2ZWQg4oCcZW5kX3Bvc+KAnSwgYnV0IHVzZXMg
YXJncy0+bGFzdGJ5dGV3cml0dGVuIGluc3RlYWQgKHdpdGggYSBjb21tZW50IHRvIHRoZSBlZmZl
Y3Qgd2h5KeKApg0KPj4+Pj4+IA0KPj4+Pj4+IEluIGZhY3QsIGdpdmVuIHRoZSBwb3RlbnRpYWwg
Zm9yIHJhY2VzIGhlcmUsIEkgdGhpbmsgdGhlIHJpZ2h0IHRoaW5nIHRvIGRvIGlzIHRvIGhhdmUg
ZXh0X3RyZWVfcHJlcGFyZV9jb21taXQoKSBhbHdheXMgc2V0IHRoZSBjb3JyZWN0IHZhbHVlIGZv
ciBhcmdzLT5sYXN0Ynl0ZXdyaXR0ZW4uDQo+Pj4+PiANCj4+Pj4+IE9LLCB0aGF0IGhhcyBjbGVh
cmVkIHVwIHRoYXQgY29tbW9uIGZhaWx1cmUgY2FzZSB0aGF0IHdhcyBnZXR0aW5nIGluIHRoZQ0K
Pj4+Pj4gd2F5LCBidXQgbm93IGl0IGNhbiBzdGlsbCBmYWlsIGxpa2UgdGhpczoNCj4+Pj4+IA0K
Pj4+PiANCj4+Pj4gR29vZCBwcm9ncmVzcyEgOi0pDQo+Pj4+IA0KPj4+Pj4gbmZzX3dyaXRlYmFj
a191cGRhdGVfaW5vZGUgc2V0cyBzaXplIDQwOTYgdy8gTkZTX0lOT19JTlZBTElEX0FUVFIgc2V0
LCBhbmQgc2V0cyBORlNfSU5PX0xBWU9VVENPTU1JVA0KPj4+Pj4gMXN0IG5mc19nZXRhdHRyIC0+
IHBuZnNfbGF5b3V0Y29tbWl0X2lub2RlIHN0YXJ0cywgY2xlYXJzIGxheW91dGNvbW1pdCBmbGFn
IHNldHMgTkZTX0lOT19MQVlPVVRDT01NSVRJTkcNCj4+Pj4+IG5mc193cml0ZWJhY2tfdXBkYXRl
X2lub2RlIHNldHMgc2l6ZSA4MTkyIHcvIE5GU19JTk9fSU5WQUxJRF9BVFRSIHNldCwgYW5kIHNl
dHMgTkZTX0lOT19MQVlPVVRDT01NSVQNCj4+Pj4+IDFzdCBuZnNfZ2V0YXR0ciAtPiBuZnM0X2xh
eW91dGNvbW1pdF9yZWxlYXNlIHNldHMgc2l6ZSA0MDk2LCBORlNfSU5PX0lOVkFMSURfQVRUUiBz
ZXQsIGNsZWFycyBORlNfSU5PX0xBWU9VVENPTU1JVFRJTkcNCj4+Pj4+IDFzdCBuZnNfZ2V0YXR0
ciAtPiBfX3JldmFsaWRhdGVfaW5vZGUgc2V0cyBzaXplIDQwOTYsIE5GU19JTk9fSU5WQUxJRF9B
VFRSIG5vdCBzZXQuLiBjYWNoZSBpcyB2YWxpZA0KPj4+Pj4gMm5kIG5mc19nZXRhdHRyIGltbWVk
aWF0ZWx5IHJldHVybnMgNDA5NiBldmVuIHRob3VnaCBORlNfSU5PX0xBWU9VVENPTU1JVA0KPj4+
PiANCj4+Pj4gSXMgdGhpcyBiZWluZyB0ZXN0ZWQgb24gdG9wIG9mIHRoZSBjdXJyZW50IGxpbnV4
LW5leHQvdGVzdGluZz8gTm9ybWFsbHksIEnigJlkIGV4cGVjdCBodHRwOi8vZ2l0LmxpbnV4LW5m
cy5vcmcvP3A9dHJvbmRteS9saW51eC1uZnMuZ2l0O2E9Y29tbWl0ZGlmZjtoPTEwYjdlOWFkNDQ4
ODFmY2Q0NmFjMjRlYjczNzQzNzdjNmU4OTYyZWQgdG8gY2F1c2UgMXN0IG5mc19nZXRhdHRyKCkg
dG8gbm90IGRlY2xhcmUgdGhlIGNhY2hlIHZhbGlkLg0KPj4+IA0KPj4+IFllcywgdGhpcyBpcyBv
biB5b3VyIGxpbnV4LW5leHQgYnJhbmNoLg0KPj4+IA0KPj4+IFdoZW4gdGhlIDFzdCBuZnNfZ2V0
YXR0cigpIGdvZXMgdGhyb3VnaCBuZnNfdXBkYXRlX2lub2RlKCkgdGhlIHNlY29uZCB0aW1lDQo+
Pj4gKGR1cmluZyBfX3JldmFsaWRhdGVfaW5vZGUpLCBORlNfSU5PX0lOVkFMSURfQVRUUiBpcyBu
ZXZlciBzZXQgYnkgYW55dGhpbmcsDQo+Pj4gc2luY2UgYWxsIHRoZSBhdHRyaWJ1dGVzIHJldHVy
bmVkIG1hdGNoIHRoZSBjYWNoZS4gIFNvIGV2ZW4gdGhvdWdoDQo+Pj4gTkZTX0lOT19MQVlPVVRD
T01NSVQgaXMgc2V0LCBhbmQgdGhlIGNhY2hlX3ZhbGlkaXR5IHZhcmlhYmxlIGlzICJmYWxzZSIs
DQo+Pj4gdGhlIE5GU19JTk9fSU5WQUxJRF9BVFRSIGlzIG5ldmVyIHNldCBpbiB0aGUgImludmFs
aWQiIGxvY2FsIHZhcmlhYmxlLg0KPj4+IA0KPj4+IFNob3VsZCBwbmZzX2xheW91dGNvbW1pdF9v
dXRzdGFuZGluZygpIGFsd2F5cyBzZXQgTkZTX0lOT19JTlZBTElEX0FUVFI/DQo+Pj4gDQo+Pj4g
QmVuDQo+PiANCj4+IG5mc19wb3N0X29wX3VwZGF0ZV9pbm9kZV9sb2NrZWQoKSBzaG91bGQgYmUg
ZG9pbmcgdGhhdCBhcyBwYXJ0IG9mIHRoZSBjYWxsY2hhaW4gaW4gbmZzX3dyaXRlYmFja191cGRh
dGVfaW5vZGUoKS4NCj4gDQo+IEFuZCBpdCBpcywgYW5kIHRoZSBiaXQgcGVyc2lzdHMgdGhyb3Vn
aCB0aGUgbmV4dCBsYXlvdXRjb21taXQsIGl0IGlzIHRoZSBuZXh0IEdFVEFUVFIgcmVzcG9uc2Ug
dGhhdCBmaW5kcyB0aGF0IGFsbCB0aGUgYXR0cmlidXRlcyBhcmUgdGhlIHNhbWUgYW5kIHRoZSBi
aXQgaXMgY2xlYXJlZC4NCj4gDQoNClNvIHdoYXQgaWYgd2UgcmVxdWlyZSB0aGF0IG5mc2ktPmNh
Y2hlX3ZhbGlkaXR5IGJlIHNldCB0byBzYXZlX2NhY2hlX3ZhbGlkaXR5ICYgTkZTX0lOT19JTlZB
TElEX0FUVFIgYXQgYSBtaW5pbXVtIGlmIHBuZnNfbGF5b3V0Y29tbWl0X291dHN0YW5kaW5nKCk/
DQoNCg0K

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-28 14:04                                                                                                             ` Trond Myklebust
@ 2016-07-28 15:38                                                                                                               ` Benjamin Coddington
  2016-07-28 15:39                                                                                                                 ` Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Benjamin Coddington @ 2016-07-28 15:38 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: hch, List Linux NFS Mailing



On 28 Jul 2016, at 10:04, Trond Myklebust wrote:

>> On Jul 28, 2016, at 08:31, Trond Myklebust <trondmy@primarydata.com> 
>> wrote:
>>
>>
>>> On Jul 28, 2016, at 05:47, Benjamin Coddington <bcodding@redhat.com> 
>>> wrote:
>>>
>>>
>>> On 27 Jul 2016, at 14:05, Trond Myklebust wrote:
>>>
>>>>> On Jul 27, 2016, at 12:14, Benjamin Coddington 
>>>>> <bcodding@redhat.com> wrote:
>>>>>
>>>>> On 27 Jul 2016, at 8:31, Trond Myklebust wrote:
>>>>>
>>>>>>> On Jul 27, 2016, at 08:15, Trond Myklebust 
>>>>>>> <trondmy@primarydata.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>> On Jul 27, 2016, at 07:55, Benjamin Coddington 
>>>>>>>> <bcodding@redhat.com> wrote:
>>>>>>>>
>>>>>>>> After adding more debugging, I see that all of that is working 
>>>>>>>> correctly,
>>>>>>>> but the first LAYOUTCOMMIT is taking the size back down to 4096 
>>>>>>>> from the
>>>>>>>> last nfs_writeback_done(), and the second LAYOUTCOMMIT never 
>>>>>>>> brings it back
>>>>>>>> up again.
>>>>>>>>
>>>>>>>
>>>>>>> Excellent! Thanks for debugging that.
>>>>>>>
>>>>>>>> Now I see that we should be marking the block extents as 
>>>>>>>> written atomically with
>>>>>>>> setting LAYOUTCOMMIT and nfsi->layout->plh_lwb, otherwise a 
>>>>>>>> LAYOUTCOMMIT can
>>>>>>>> collect extents just added from the next bl_write_cleanup().  
>>>>>>>> Then, the next
>>>>>>>> LAYOUTCOMMIT fails, and all we're left with is the size from 
>>>>>>>> the first
>>>>>>>> LAYOUTCOMMIT.  Not sure if that particular problem is the whole 
>>>>>>>> fix, but
>>>>>>>> that's something to work on.
>>>>>>>>
>>>>>>>> I see ways to fix that:
>>>>>>>>
>>>>>>>> - make a new pnfs_set_layoutcommit_locked() that can be used to 
>>>>>>>> call
>>>>>>>>  ext_tree_mark_written() inside the i_lock
>>>>>>>>
>>>>>>>> - make another pnfs_layoutdriver_type operation to be used 
>>>>>>>> within
>>>>>>>>  pnfs_set_layoutcommit (mark_layoutcommit? set_layoutcommit?), 
>>>>>>>> and call
>>>>>>>>  ext_tree_mark_written() within that..
>>>>>>>>
>>>>>>>> - have .prepare_layoutcommit return a new positive plh_lwb that 
>>>>>>>> would
>>>>>>>>  extend the current LAYOUTCOMMIT
>>>>>>>>
>>>>>>>> - make ext_tree_prepare_commit only encode up to plh_lwb
>>>>>>>
>>>>>>> I see no reason why ext_tree_prepare_commit() shouldn’t be 
>>>>>>> allowed to extend the args->lastbytewritten. This is a metadata 
>>>>>>> operation that is owned by the pNFS layout driver.
>>>>>>> The only thing I’d note is you should then rewrite the failure 
>>>>>>> case in pnfs_layoutcommit_inode() so that it doesn’t rely on 
>>>>>>> the saved “end_pos”, but uses args->lastbytewritten instead 
>>>>>>> (with a comment to the effect why)…
>>>>>>
>>>>>> In fact, given the potential for races here, I think the right 
>>>>>> thing to do is to have ext_tree_prepare_commit() always set the 
>>>>>> correct value for args->lastbytewritten.
>>>>>
>>>>> OK, that has cleared up that common failure case that was getting 
>>>>> in the
>>>>> way, but now it can still fail like this:
>>>>>
>>>>
>>>> Good progress! :-)
>>>>
>>>>> nfs_writeback_update_inode sets size 4096 w/ NFS_INO_INVALID_ATTR 
>>>>> set, and sets NFS_INO_LAYOUTCOMMIT
>>>>> 1st nfs_getattr -> pnfs_layoutcommit_inode starts, clears 
>>>>> layoutcommit flag sets NFS_INO_LAYOUTCOMMITING
>>>>> nfs_writeback_update_inode sets size 8192 w/ NFS_INO_INVALID_ATTR 
>>>>> set, and sets NFS_INO_LAYOUTCOMMIT
>>>>> 1st nfs_getattr -> nfs4_layoutcommit_release sets size 4096, 
>>>>> NFS_INO_INVALID_ATTR set, clears NFS_INO_LAYOUTCOMMITTING
>>>>> 1st nfs_getattr -> __revalidate_inode sets size 4096, 
>>>>> NFS_INO_INVALID_ATTR not set.. cache is valid
>>>>> 2nd nfs_getattr immediately returns 4096 even though 
>>>>> NFS_INO_LAYOUTCOMMIT
>>>>
>>>> Is this being tested on top of the current linux-next/testing? 
>>>> Normally, I’d expect 
>>>> http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=10b7e9ad44881fcd46ac24eb7374377c6e8962ed 
>>>> to cause 1st nfs_getattr() to not declare the cache valid.
>>>
>>> Yes, this is on your linux-next branch.
>>>
>>> When the 1st nfs_getattr() goes through nfs_update_inode() the 
>>> second time
>>> (during __revalidate_inode), NFS_INO_INVALID_ATTR is never set by 
>>> anything,
>>> since all the attributes returned match the cache.  So even though
>>> NFS_INO_LAYOUTCOMMIT is set, and the cache_validity variable is 
>>> "false",
>>> the NFS_INO_INVALID_ATTR is never set in the "invalid" local 
>>> variable.
>>>
>>> Should pnfs_layoutcommit_outstanding() always set 
>>> NFS_INO_INVALID_ATTR?
>>>
>>> Ben
>>
>> nfs_post_op_update_inode_locked() should be doing that as part of the 
>> callchain in nfs_writeback_update_inode().
>>
>
> By the way. I just noticed that nothing appears to be using the 
> attributes we retrieve as part of the layoutcommit call. Does adding a 
> nfs_refresh_inode() to the “success” path in 
> nfs4_layoutcommit_done() perhaps help?

We do it in layoutcommit_release:

  nfs4_layoutcommit_done [nfsv4]() {
    ...
  }
  nfs4_layoutcommit_release [nfsv4]() {
    ...
    nfs_post_op_update_inode_force_wcc [nfs]() {
      nfs_post_op_update_inode_force_wcc_locked [nfs]() {
        nfs_post_op_update_inode_locked [nfs]() {
          nfs4_have_delegation [nfsv4]() {
            nfs4_do_check_delegation [nfsv4]();
          }
          nfs_refresh_inode_locked [nfs]() {
            nfs_update_inode [nfs]() {


Should I still try adding it in nfs4_layoutcommit_done()?

Ben

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-28 15:38                                                                                                               ` Benjamin Coddington
@ 2016-07-28 15:39                                                                                                                 ` Trond Myklebust
  0 siblings, 0 replies; 69+ messages in thread
From: Trond Myklebust @ 2016-07-28 15:39 UTC (permalink / raw)
  To: Coddington Benjamin; +Cc: hch, List Linux NFS Mailing

DQo+IE9uIEp1bCAyOCwgMjAxNiwgYXQgMTE6MzgsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2Rk
aW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPiANCj4gDQo+IA0KPiBPbiAyOCBKdWwgMjAxNiwgYXQg
MTA6MDQsIFRyb25kIE15a2xlYnVzdCB3cm90ZToNCj4gDQo+Pj4gT24gSnVsIDI4LCAyMDE2LCBh
dCAwODozMSwgVHJvbmQgTXlrbGVidXN0IDx0cm9uZG15QHByaW1hcnlkYXRhLmNvbT4gd3JvdGU6
DQo+Pj4gDQo+Pj4gDQo+Pj4+IE9uIEp1bCAyOCwgMjAxNiwgYXQgMDU6NDcsIEJlbmphbWluIENv
ZGRpbmd0b24gPGJjb2RkaW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPj4+PiANCj4+Pj4gDQo+Pj4+
IE9uIDI3IEp1bCAyMDE2LCBhdCAxNDowNSwgVHJvbmQgTXlrbGVidXN0IHdyb3RlOg0KPj4+PiAN
Cj4+Pj4+PiBPbiBKdWwgMjcsIDIwMTYsIGF0IDEyOjE0LCBCZW5qYW1pbiBDb2RkaW5ndG9uIDxi
Y29kZGluZ0ByZWRoYXQuY29tPiB3cm90ZToNCj4+Pj4+PiANCj4+Pj4+PiBPbiAyNyBKdWwgMjAx
NiwgYXQgODozMSwgVHJvbmQgTXlrbGVidXN0IHdyb3RlOg0KPj4+Pj4+IA0KPj4+Pj4+Pj4gT24g
SnVsIDI3LCAyMDE2LCBhdCAwODoxNSwgVHJvbmQgTXlrbGVidXN0IDx0cm9uZG15QHByaW1hcnlk
YXRhLmNvbT4gd3JvdGU6DQo+Pj4+Pj4+PiANCj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4+IE9uIEp1bCAy
NywgMjAxNiwgYXQgMDc6NTUsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2RkaW5nQHJlZGhhdC5j
b20+IHdyb3RlOg0KPj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4+IEFmdGVyIGFkZGluZyBtb3JlIGRlYnVn
Z2luZywgSSBzZWUgdGhhdCBhbGwgb2YgdGhhdCBpcyB3b3JraW5nIGNvcnJlY3RseSwNCj4+Pj4+
Pj4+PiBidXQgdGhlIGZpcnN0IExBWU9VVENPTU1JVCBpcyB0YWtpbmcgdGhlIHNpemUgYmFjayBk
b3duIHRvIDQwOTYgZnJvbSB0aGUNCj4+Pj4+Pj4+PiBsYXN0IG5mc193cml0ZWJhY2tfZG9uZSgp
LCBhbmQgdGhlIHNlY29uZCBMQVlPVVRDT01NSVQgbmV2ZXIgYnJpbmdzIGl0IGJhY2sNCj4+Pj4+
Pj4+PiB1cCBhZ2Fpbi4NCj4+Pj4+Pj4+PiANCj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4gRXhjZWxsZW50
ISBUaGFua3MgZm9yIGRlYnVnZ2luZyB0aGF0Lg0KPj4+Pj4+Pj4gDQo+Pj4+Pj4+Pj4gTm93IEkg
c2VlIHRoYXQgd2Ugc2hvdWxkIGJlIG1hcmtpbmcgdGhlIGJsb2NrIGV4dGVudHMgYXMgd3JpdHRl
biBhdG9taWNhbGx5IHdpdGgNCj4+Pj4+Pj4+PiBzZXR0aW5nIExBWU9VVENPTU1JVCBhbmQgbmZz
aS0+bGF5b3V0LT5wbGhfbHdiLCBvdGhlcndpc2UgYSBMQVlPVVRDT01NSVQgY2FuDQo+Pj4+Pj4+
Pj4gY29sbGVjdCBleHRlbnRzIGp1c3QgYWRkZWQgZnJvbSB0aGUgbmV4dCBibF93cml0ZV9jbGVh
bnVwKCkuICBUaGVuLCB0aGUgbmV4dA0KPj4+Pj4+Pj4+IExBWU9VVENPTU1JVCBmYWlscywgYW5k
IGFsbCB3ZSdyZSBsZWZ0IHdpdGggaXMgdGhlIHNpemUgZnJvbSB0aGUgZmlyc3QNCj4+Pj4+Pj4+
PiBMQVlPVVRDT01NSVQuICBOb3Qgc3VyZSBpZiB0aGF0IHBhcnRpY3VsYXIgcHJvYmxlbSBpcyB0
aGUgd2hvbGUgZml4LCBidXQNCj4+Pj4+Pj4+PiB0aGF0J3Mgc29tZXRoaW5nIHRvIHdvcmsgb24u
DQo+Pj4+Pj4+Pj4gDQo+Pj4+Pj4+Pj4gSSBzZWUgd2F5cyB0byBmaXggdGhhdDoNCj4+Pj4+Pj4+
PiANCj4+Pj4+Pj4+PiAtIG1ha2UgYSBuZXcgcG5mc19zZXRfbGF5b3V0Y29tbWl0X2xvY2tlZCgp
IHRoYXQgY2FuIGJlIHVzZWQgdG8gY2FsbA0KPj4+Pj4+Pj4+IGV4dF90cmVlX21hcmtfd3JpdHRl
bigpIGluc2lkZSB0aGUgaV9sb2NrDQo+Pj4+Pj4+Pj4gDQo+Pj4+Pj4+Pj4gLSBtYWtlIGFub3Ro
ZXIgcG5mc19sYXlvdXRkcml2ZXJfdHlwZSBvcGVyYXRpb24gdG8gYmUgdXNlZCB3aXRoaW4NCj4+
Pj4+Pj4+PiBwbmZzX3NldF9sYXlvdXRjb21taXQgKG1hcmtfbGF5b3V0Y29tbWl0PyBzZXRfbGF5
b3V0Y29tbWl0PyksIGFuZCBjYWxsDQo+Pj4+Pj4+Pj4gZXh0X3RyZWVfbWFya193cml0dGVuKCkg
d2l0aGluIHRoYXQuLg0KPj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4+IC0gaGF2ZSAucHJlcGFyZV9sYXlv
dXRjb21taXQgcmV0dXJuIGEgbmV3IHBvc2l0aXZlIHBsaF9sd2IgdGhhdCB3b3VsZA0KPj4+Pj4+
Pj4+IGV4dGVuZCB0aGUgY3VycmVudCBMQVlPVVRDT01NSVQNCj4+Pj4+Pj4+PiANCj4+Pj4+Pj4+
PiAtIG1ha2UgZXh0X3RyZWVfcHJlcGFyZV9jb21taXQgb25seSBlbmNvZGUgdXAgdG8gcGxoX2x3
Yg0KPj4+Pj4+Pj4gDQo+Pj4+Pj4+PiBJIHNlZSBubyByZWFzb24gd2h5IGV4dF90cmVlX3ByZXBh
cmVfY29tbWl0KCkgc2hvdWxkbuKAmXQgYmUgYWxsb3dlZCB0byBleHRlbmQgdGhlIGFyZ3MtPmxh
c3RieXRld3JpdHRlbi4gVGhpcyBpcyBhIG1ldGFkYXRhIG9wZXJhdGlvbiB0aGF0IGlzIG93bmVk
IGJ5IHRoZSBwTkZTIGxheW91dCBkcml2ZXIuDQo+Pj4+Pj4+PiBUaGUgb25seSB0aGluZyBJ4oCZ
ZCBub3RlIGlzIHlvdSBzaG91bGQgdGhlbiByZXdyaXRlIHRoZSBmYWlsdXJlIGNhc2UgaW4gcG5m
c19sYXlvdXRjb21taXRfaW5vZGUoKSBzbyB0aGF0IGl0IGRvZXNu4oCZdCByZWx5IG9uIHRoZSBz
YXZlZCDigJxlbmRfcG9z4oCdLCBidXQgdXNlcyBhcmdzLT5sYXN0Ynl0ZXdyaXR0ZW4gaW5zdGVh
ZCAod2l0aCBhIGNvbW1lbnQgdG8gdGhlIGVmZmVjdCB3aHkp4oCmDQo+Pj4+Pj4+IA0KPj4+Pj4+
PiBJbiBmYWN0LCBnaXZlbiB0aGUgcG90ZW50aWFsIGZvciByYWNlcyBoZXJlLCBJIHRoaW5rIHRo
ZSByaWdodCB0aGluZyB0byBkbyBpcyB0byBoYXZlIGV4dF90cmVlX3ByZXBhcmVfY29tbWl0KCkg
YWx3YXlzIHNldCB0aGUgY29ycmVjdCB2YWx1ZSBmb3IgYXJncy0+bGFzdGJ5dGV3cml0dGVuLg0K
Pj4+Pj4+IA0KPj4+Pj4+IE9LLCB0aGF0IGhhcyBjbGVhcmVkIHVwIHRoYXQgY29tbW9uIGZhaWx1
cmUgY2FzZSB0aGF0IHdhcyBnZXR0aW5nIGluIHRoZQ0KPj4+Pj4+IHdheSwgYnV0IG5vdyBpdCBj
YW4gc3RpbGwgZmFpbCBsaWtlIHRoaXM6DQo+Pj4+Pj4gDQo+Pj4+PiANCj4+Pj4+IEdvb2QgcHJv
Z3Jlc3MhIDotKQ0KPj4+Pj4gDQo+Pj4+Pj4gbmZzX3dyaXRlYmFja191cGRhdGVfaW5vZGUgc2V0
cyBzaXplIDQwOTYgdy8gTkZTX0lOT19JTlZBTElEX0FUVFIgc2V0LCBhbmQgc2V0cyBORlNfSU5P
X0xBWU9VVENPTU1JVA0KPj4+Pj4+IDFzdCBuZnNfZ2V0YXR0ciAtPiBwbmZzX2xheW91dGNvbW1p
dF9pbm9kZSBzdGFydHMsIGNsZWFycyBsYXlvdXRjb21taXQgZmxhZyBzZXRzIE5GU19JTk9fTEFZ
T1VUQ09NTUlUSU5HDQo+Pj4+Pj4gbmZzX3dyaXRlYmFja191cGRhdGVfaW5vZGUgc2V0cyBzaXpl
IDgxOTIgdy8gTkZTX0lOT19JTlZBTElEX0FUVFIgc2V0LCBhbmQgc2V0cyBORlNfSU5PX0xBWU9V
VENPTU1JVA0KPj4+Pj4+IDFzdCBuZnNfZ2V0YXR0ciAtPiBuZnM0X2xheW91dGNvbW1pdF9yZWxl
YXNlIHNldHMgc2l6ZSA0MDk2LCBORlNfSU5PX0lOVkFMSURfQVRUUiBzZXQsIGNsZWFycyBORlNf
SU5PX0xBWU9VVENPTU1JVFRJTkcNCj4+Pj4+PiAxc3QgbmZzX2dldGF0dHIgLT4gX19yZXZhbGlk
YXRlX2lub2RlIHNldHMgc2l6ZSA0MDk2LCBORlNfSU5PX0lOVkFMSURfQVRUUiBub3Qgc2V0Li4g
Y2FjaGUgaXMgdmFsaWQNCj4+Pj4+PiAybmQgbmZzX2dldGF0dHIgaW1tZWRpYXRlbHkgcmV0dXJu
cyA0MDk2IGV2ZW4gdGhvdWdoIE5GU19JTk9fTEFZT1VUQ09NTUlUDQo+Pj4+PiANCj4+Pj4+IElz
IHRoaXMgYmVpbmcgdGVzdGVkIG9uIHRvcCBvZiB0aGUgY3VycmVudCBsaW51eC1uZXh0L3Rlc3Rp
bmc/IE5vcm1hbGx5LCBJ4oCZZCBleHBlY3QgaHR0cDovL2dpdC5saW51eC1uZnMub3JnLz9wPXRy
b25kbXkvbGludXgtbmZzLmdpdDthPWNvbW1pdGRpZmY7aD0xMGI3ZTlhZDQ0ODgxZmNkNDZhYzI0
ZWI3Mzc0Mzc3YzZlODk2MmVkIHRvIGNhdXNlIDFzdCBuZnNfZ2V0YXR0cigpIHRvIG5vdCBkZWNs
YXJlIHRoZSBjYWNoZSB2YWxpZC4NCj4+Pj4gDQo+Pj4+IFllcywgdGhpcyBpcyBvbiB5b3VyIGxp
bnV4LW5leHQgYnJhbmNoLg0KPj4+PiANCj4+Pj4gV2hlbiB0aGUgMXN0IG5mc19nZXRhdHRyKCkg
Z29lcyB0aHJvdWdoIG5mc191cGRhdGVfaW5vZGUoKSB0aGUgc2Vjb25kIHRpbWUNCj4+Pj4gKGR1
cmluZyBfX3JldmFsaWRhdGVfaW5vZGUpLCBORlNfSU5PX0lOVkFMSURfQVRUUiBpcyBuZXZlciBz
ZXQgYnkgYW55dGhpbmcsDQo+Pj4+IHNpbmNlIGFsbCB0aGUgYXR0cmlidXRlcyByZXR1cm5lZCBt
YXRjaCB0aGUgY2FjaGUuICBTbyBldmVuIHRob3VnaA0KPj4+PiBORlNfSU5PX0xBWU9VVENPTU1J
VCBpcyBzZXQsIGFuZCB0aGUgY2FjaGVfdmFsaWRpdHkgdmFyaWFibGUgaXMgImZhbHNlIiwNCj4+
Pj4gdGhlIE5GU19JTk9fSU5WQUxJRF9BVFRSIGlzIG5ldmVyIHNldCBpbiB0aGUgImludmFsaWQi
IGxvY2FsIHZhcmlhYmxlLg0KPj4+PiANCj4+Pj4gU2hvdWxkIHBuZnNfbGF5b3V0Y29tbWl0X291
dHN0YW5kaW5nKCkgYWx3YXlzIHNldCBORlNfSU5PX0lOVkFMSURfQVRUUj8NCj4+Pj4gDQo+Pj4+
IEJlbg0KPj4+IA0KPj4+IG5mc19wb3N0X29wX3VwZGF0ZV9pbm9kZV9sb2NrZWQoKSBzaG91bGQg
YmUgZG9pbmcgdGhhdCBhcyBwYXJ0IG9mIHRoZSBjYWxsY2hhaW4gaW4gbmZzX3dyaXRlYmFja191
cGRhdGVfaW5vZGUoKS4NCj4+PiANCj4+IA0KPj4gQnkgdGhlIHdheS4gSSBqdXN0IG5vdGljZWQg
dGhhdCBub3RoaW5nIGFwcGVhcnMgdG8gYmUgdXNpbmcgdGhlIGF0dHJpYnV0ZXMgd2UgcmV0cmll
dmUgYXMgcGFydCBvZiB0aGUgbGF5b3V0Y29tbWl0IGNhbGwuIERvZXMgYWRkaW5nIGEgbmZzX3Jl
ZnJlc2hfaW5vZGUoKSB0byB0aGUg4oCcc3VjY2Vzc+KAnSBwYXRoIGluIG5mczRfbGF5b3V0Y29t
bWl0X2RvbmUoKSBwZXJoYXBzIGhlbHA/DQo+IA0KPiBXZSBkbyBpdCBpbiBsYXlvdXRjb21taXRf
cmVsZWFzZToNCj4gDQo+IG5mczRfbGF5b3V0Y29tbWl0X2RvbmUgW25mc3Y0XSgpIHsNCj4gICAu
Li4NCj4gfQ0KPiBuZnM0X2xheW91dGNvbW1pdF9yZWxlYXNlIFtuZnN2NF0oKSB7DQo+ICAgLi4u
DQo+ICAgbmZzX3Bvc3Rfb3BfdXBkYXRlX2lub2RlX2ZvcmNlX3djYyBbbmZzXSgpIHsNCj4gICAg
IG5mc19wb3N0X29wX3VwZGF0ZV9pbm9kZV9mb3JjZV93Y2NfbG9ja2VkIFtuZnNdKCkgew0KPiAg
ICAgICBuZnNfcG9zdF9vcF91cGRhdGVfaW5vZGVfbG9ja2VkIFtuZnNdKCkgew0KPiAgICAgICAg
IG5mczRfaGF2ZV9kZWxlZ2F0aW9uIFtuZnN2NF0oKSB7DQo+ICAgICAgICAgICBuZnM0X2RvX2No
ZWNrX2RlbGVnYXRpb24gW25mc3Y0XSgpOw0KPiAgICAgICAgIH0NCj4gICAgICAgICBuZnNfcmVm
cmVzaF9pbm9kZV9sb2NrZWQgW25mc10oKSB7DQo+ICAgICAgICAgICBuZnNfdXBkYXRlX2lub2Rl
IFtuZnNdKCkgew0KPiANCj4gDQo+IFNob3VsZCBJIHN0aWxsIHRyeSBhZGRpbmcgaXQgaW4gbmZz
NF9sYXlvdXRjb21taXRfZG9uZSgpPw0KDQpObywgdGhhdOKAmXMgT0suIEFzIGxvbmcgYXMgd2Xi
gJlyZSB1c2luZyBpdOKApiBQbGVhc2Ugc2VlIG15IHByZXZpb3VzIGVtYWlsLCB0aG91Z2gsIGZv
ciBob3cgd2UgbWlnaHQgY2hhbmdlIG5mc191cGRhdGVfaW5vZGUoKQ==

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-28 15:36                                                                                                               ` Trond Myklebust
@ 2016-07-28 16:40                                                                                                                 ` Benjamin Coddington
  2016-07-28 16:41                                                                                                                   ` Trond Myklebust
  0 siblings, 1 reply; 69+ messages in thread
From: Benjamin Coddington @ 2016-07-28 16:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: hch, List Linux NFS Mailing

On 28 Jul 2016, at 11:36, Trond Myklebust wrote:

>> On Jul 28, 2016, at 11:33, Benjamin Coddington <bcodding@redhat.com> 
>> wrote:
>>
>> On 28 Jul 2016, at 8:31, Trond Myklebust wrote:
>>
>>>> On Jul 28, 2016, at 05:47, Benjamin Coddington 
>>>> <bcodding@redhat.com> wrote:
>>>>
>>>>
>>>> On 27 Jul 2016, at 14:05, Trond Myklebust wrote:
>>>>
>>>>>> On Jul 27, 2016, at 12:14, Benjamin Coddington 
>>>>>> <bcodding@redhat.com> wrote:
>>>>>>
>>>>>> On 27 Jul 2016, at 8:31, Trond Myklebust wrote:
>>>>>>
>>>>>>>> On Jul 27, 2016, at 08:15, Trond Myklebust 
>>>>>>>> <trondmy@primarydata.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Jul 27, 2016, at 07:55, Benjamin Coddington 
>>>>>>>>> <bcodding@redhat.com> wrote:
>>>>>>>>>
>>>>>>>>> After adding more debugging, I see that all of that is working 
>>>>>>>>> correctly,
>>>>>>>>> but the first LAYOUTCOMMIT is taking the size back down to 
>>>>>>>>> 4096 from the
>>>>>>>>> last nfs_writeback_done(), and the second LAYOUTCOMMIT never 
>>>>>>>>> brings it back
>>>>>>>>> up again.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Excellent! Thanks for debugging that.
>>>>>>>>
>>>>>>>>> Now I see that we should be marking the block extents as 
>>>>>>>>> written atomically with
>>>>>>>>> setting LAYOUTCOMMIT and nfsi->layout->plh_lwb, otherwise a 
>>>>>>>>> LAYOUTCOMMIT can
>>>>>>>>> collect extents just added from the next bl_write_cleanup().  
>>>>>>>>> Then, the next
>>>>>>>>> LAYOUTCOMMIT fails, and all we're left with is the size from 
>>>>>>>>> the first
>>>>>>>>> LAYOUTCOMMIT.  Not sure if that particular problem is the 
>>>>>>>>> whole fix, but
>>>>>>>>> that's something to work on.
>>>>>>>>>
>>>>>>>>> I see ways to fix that:
>>>>>>>>>
>>>>>>>>> - make a new pnfs_set_layoutcommit_locked() that can be used 
>>>>>>>>> to call
>>>>>>>>>  ext_tree_mark_written() inside the i_lock
>>>>>>>>>
>>>>>>>>> - make another pnfs_layoutdriver_type operation to be used 
>>>>>>>>> within
>>>>>>>>>  pnfs_set_layoutcommit (mark_layoutcommit? set_layoutcommit?), 
>>>>>>>>> and call
>>>>>>>>>  ext_tree_mark_written() within that..
>>>>>>>>>
>>>>>>>>> - have .prepare_layoutcommit return a new positive plh_lwb 
>>>>>>>>> that would
>>>>>>>>>  extend the current LAYOUTCOMMIT
>>>>>>>>>
>>>>>>>>> - make ext_tree_prepare_commit only encode up to plh_lwb
>>>>>>>>
>>>>>>>> I see no reason why ext_tree_prepare_commit() shouldn’t be 
>>>>>>>> allowed to extend the args->lastbytewritten. This is a metadata 
>>>>>>>> operation that is owned by the pNFS layout driver.
>>>>>>>> The only thing I’d note is you should then rewrite the 
>>>>>>>> failure case in pnfs_layoutcommit_inode() so that it doesn’t 
>>>>>>>> rely on the saved “end_pos”, but uses args->lastbytewritten 
>>>>>>>> instead (with a comment to the effect why)…
>>>>>>>
>>>>>>> In fact, given the potential for races here, I think the right 
>>>>>>> thing to do is to have ext_tree_prepare_commit() always set the 
>>>>>>> correct value for args->lastbytewritten.
>>>>>>
>>>>>> OK, that has cleared up that common failure case that was getting 
>>>>>> in the
>>>>>> way, but now it can still fail like this:
>>>>>>
>>>>>
>>>>> Good progress! :-)
>>>>>
>>>>>> nfs_writeback_update_inode sets size 4096 w/ NFS_INO_INVALID_ATTR 
>>>>>> set, and sets NFS_INO_LAYOUTCOMMIT
>>>>>> 1st nfs_getattr -> pnfs_layoutcommit_inode starts, clears 
>>>>>> layoutcommit flag sets NFS_INO_LAYOUTCOMMITING
>>>>>> nfs_writeback_update_inode sets size 8192 w/ NFS_INO_INVALID_ATTR 
>>>>>> set, and sets NFS_INO_LAYOUTCOMMIT
>>>>>> 1st nfs_getattr -> nfs4_layoutcommit_release sets size 4096, 
>>>>>> NFS_INO_INVALID_ATTR set, clears NFS_INO_LAYOUTCOMMITTING
>>>>>> 1st nfs_getattr -> __revalidate_inode sets size 4096, 
>>>>>> NFS_INO_INVALID_ATTR not set.. cache is valid
>>>>>> 2nd nfs_getattr immediately returns 4096 even though 
>>>>>> NFS_INO_LAYOUTCOMMIT
>>>>>
>>>>> Is this being tested on top of the current linux-next/testing? 
>>>>> Normally, I’d expect 
>>>>> http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=10b7e9ad44881fcd46ac24eb7374377c6e8962ed 
>>>>> to cause 1st nfs_getattr() to not declare the cache valid.
>>>>
>>>> Yes, this is on your linux-next branch.
>>>>
>>>> When the 1st nfs_getattr() goes through nfs_update_inode() the 
>>>> second time
>>>> (during __revalidate_inode), NFS_INO_INVALID_ATTR is never set by 
>>>> anything,
>>>> since all the attributes returned match the cache.  So even though
>>>> NFS_INO_LAYOUTCOMMIT is set, and the cache_validity variable is 
>>>> "false",
>>>> the NFS_INO_INVALID_ATTR is never set in the "invalid" local 
>>>> variable.
>>>>
>>>> Should pnfs_layoutcommit_outstanding() always set 
>>>> NFS_INO_INVALID_ATTR?
>>>>
>>>> Ben
>>>
>>> nfs_post_op_update_inode_locked() should be doing that as part of 
>>> the callchain in nfs_writeback_update_inode().
>>
>> And it is, and the bit persists through the next layoutcommit, it is 
>> the next GETATTR response that finds that all the attributes are the 
>> same and the bit is cleared.
>>
>
> So what if we require that nfsi->cache_validity be set to 
> save_cache_validity & NFS_INO_INVALID_ATTR at a minimum if 
> pnfs_layoutcommit_outstanding()?

With this, I am unable to reproduce the problem:

@@ -1665,7 +1684,7 @@ static int nfs_update_inode(struct inode *inode, 
struct nfs_fattr *fattr)
         unsigned long now = jiffies;
         unsigned long save_cache_validity;
         bool have_writers = nfs_file_has_buffered_writers(nfsi);
-       bool cache_revalidated;
+       bool cache_revalidated = true;

         dfprintk(VFS, "NFS: %s(%s/%lu fh_crc=0x%08x ct=%d 
info=0x%x)\n",
                         __func__, inode->i_sb->s_id, inode->i_ino,
@@ -1714,8 +1733,10 @@ static int nfs_update_inode(struct inode *inode, 
struct nfs_fattr *fattr)
         /* Do atomic weak cache consistency updates */
         invalid |= nfs_wcc_update_inode(inode, fattr);

-
-       cache_revalidated = !pnfs_layoutcommit_outstanding(inode);
+       if (pnfs_layoutcommit_outstanding(inode)) {
+               nfsi->cache_validity |= save_cache_validity & 
NFS_INO_INVALID_ATTR;
+               cache_revalidated = false;
+       }

I'll send these two patches along shortly unless otherwise called off..

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics
  2016-07-28 16:40                                                                                                                 ` Benjamin Coddington
@ 2016-07-28 16:41                                                                                                                   ` Trond Myklebust
  0 siblings, 0 replies; 69+ messages in thread
From: Trond Myklebust @ 2016-07-28 16:41 UTC (permalink / raw)
  To: Coddington Benjamin; +Cc: hch, List Linux NFS Mailing

DQo+IE9uIEp1bCAyOCwgMjAxNiwgYXQgMTI6NDAsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2Rk
aW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPiANCj4gT24gMjggSnVsIDIwMTYsIGF0IDExOjM2LCBU
cm9uZCBNeWtsZWJ1c3Qgd3JvdGU6DQo+IA0KPj4+IE9uIEp1bCAyOCwgMjAxNiwgYXQgMTE6MzMs
IEJlbmphbWluIENvZGRpbmd0b24gPGJjb2RkaW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPj4+IA0K
Pj4+IE9uIDI4IEp1bCAyMDE2LCBhdCA4OjMxLCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6DQo+Pj4g
DQo+Pj4+PiBPbiBKdWwgMjgsIDIwMTYsIGF0IDA1OjQ3LCBCZW5qYW1pbiBDb2RkaW5ndG9uIDxi
Y29kZGluZ0ByZWRoYXQuY29tPiB3cm90ZToNCj4+Pj4+IA0KPj4+Pj4gDQo+Pj4+PiBPbiAyNyBK
dWwgMjAxNiwgYXQgMTQ6MDUsIFRyb25kIE15a2xlYnVzdCB3cm90ZToNCj4+Pj4+IA0KPj4+Pj4+
PiBPbiBKdWwgMjcsIDIwMTYsIGF0IDEyOjE0LCBCZW5qYW1pbiBDb2RkaW5ndG9uIDxiY29kZGlu
Z0ByZWRoYXQuY29tPiB3cm90ZToNCj4+Pj4+Pj4gDQo+Pj4+Pj4+IE9uIDI3IEp1bCAyMDE2LCBh
dCA4OjMxLCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6DQo+Pj4+Pj4+IA0KPj4+Pj4+Pj4+IE9uIEp1
bCAyNywgMjAxNiwgYXQgMDg6MTUsIFRyb25kIE15a2xlYnVzdCA8dHJvbmRteUBwcmltYXJ5ZGF0
YS5jb20+IHdyb3RlOg0KPj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4+PiBPbiBKdWwg
MjcsIDIwMTYsIGF0IDA3OjU1LCBCZW5qYW1pbiBDb2RkaW5ndG9uIDxiY29kZGluZ0ByZWRoYXQu
Y29tPiB3cm90ZToNCj4+Pj4+Pj4+Pj4gDQo+Pj4+Pj4+Pj4+IEFmdGVyIGFkZGluZyBtb3JlIGRl
YnVnZ2luZywgSSBzZWUgdGhhdCBhbGwgb2YgdGhhdCBpcyB3b3JraW5nIGNvcnJlY3RseSwNCj4+
Pj4+Pj4+Pj4gYnV0IHRoZSBmaXJzdCBMQVlPVVRDT01NSVQgaXMgdGFraW5nIHRoZSBzaXplIGJh
Y2sgZG93biB0byA0MDk2IGZyb20gdGhlDQo+Pj4+Pj4+Pj4+IGxhc3QgbmZzX3dyaXRlYmFja19k
b25lKCksIGFuZCB0aGUgc2Vjb25kIExBWU9VVENPTU1JVCBuZXZlciBicmluZ3MgaXQgYmFjaw0K
Pj4+Pj4+Pj4+PiB1cCBhZ2Fpbi4NCj4+Pj4+Pj4+Pj4gDQo+Pj4+Pj4+Pj4gDQo+Pj4+Pj4+Pj4g
RXhjZWxsZW50ISBUaGFua3MgZm9yIGRlYnVnZ2luZyB0aGF0Lg0KPj4+Pj4+Pj4+IA0KPj4+Pj4+
Pj4+PiBOb3cgSSBzZWUgdGhhdCB3ZSBzaG91bGQgYmUgbWFya2luZyB0aGUgYmxvY2sgZXh0ZW50
cyBhcyB3cml0dGVuIGF0b21pY2FsbHkgd2l0aA0KPj4+Pj4+Pj4+PiBzZXR0aW5nIExBWU9VVENP
TU1JVCBhbmQgbmZzaS0+bGF5b3V0LT5wbGhfbHdiLCBvdGhlcndpc2UgYSBMQVlPVVRDT01NSVQg
Y2FuDQo+Pj4+Pj4+Pj4+IGNvbGxlY3QgZXh0ZW50cyBqdXN0IGFkZGVkIGZyb20gdGhlIG5leHQg
Ymxfd3JpdGVfY2xlYW51cCgpLiAgVGhlbiwgdGhlIG5leHQNCj4+Pj4+Pj4+Pj4gTEFZT1VUQ09N
TUlUIGZhaWxzLCBhbmQgYWxsIHdlJ3JlIGxlZnQgd2l0aCBpcyB0aGUgc2l6ZSBmcm9tIHRoZSBm
aXJzdA0KPj4+Pj4+Pj4+PiBMQVlPVVRDT01NSVQuICBOb3Qgc3VyZSBpZiB0aGF0IHBhcnRpY3Vs
YXIgcHJvYmxlbSBpcyB0aGUgd2hvbGUgZml4LCBidXQNCj4+Pj4+Pj4+Pj4gdGhhdCdzIHNvbWV0
aGluZyB0byB3b3JrIG9uLg0KPj4+Pj4+Pj4+PiANCj4+Pj4+Pj4+Pj4gSSBzZWUgd2F5cyB0byBm
aXggdGhhdDoNCj4+Pj4+Pj4+Pj4gDQo+Pj4+Pj4+Pj4+IC0gbWFrZSBhIG5ldyBwbmZzX3NldF9s
YXlvdXRjb21taXRfbG9ja2VkKCkgdGhhdCBjYW4gYmUgdXNlZCB0byBjYWxsDQo+Pj4+Pj4+Pj4+
IGV4dF90cmVlX21hcmtfd3JpdHRlbigpIGluc2lkZSB0aGUgaV9sb2NrDQo+Pj4+Pj4+Pj4+IA0K
Pj4+Pj4+Pj4+PiAtIG1ha2UgYW5vdGhlciBwbmZzX2xheW91dGRyaXZlcl90eXBlIG9wZXJhdGlv
biB0byBiZSB1c2VkIHdpdGhpbg0KPj4+Pj4+Pj4+PiBwbmZzX3NldF9sYXlvdXRjb21taXQgKG1h
cmtfbGF5b3V0Y29tbWl0PyBzZXRfbGF5b3V0Y29tbWl0PyksIGFuZCBjYWxsDQo+Pj4+Pj4+Pj4+
IGV4dF90cmVlX21hcmtfd3JpdHRlbigpIHdpdGhpbiB0aGF0Li4NCj4+Pj4+Pj4+Pj4gDQo+Pj4+
Pj4+Pj4+IC0gaGF2ZSAucHJlcGFyZV9sYXlvdXRjb21taXQgcmV0dXJuIGEgbmV3IHBvc2l0aXZl
IHBsaF9sd2IgdGhhdCB3b3VsZA0KPj4+Pj4+Pj4+PiBleHRlbmQgdGhlIGN1cnJlbnQgTEFZT1VU
Q09NTUlUDQo+Pj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4+PiAtIG1ha2UgZXh0X3RyZWVfcHJlcGFyZV9j
b21taXQgb25seSBlbmNvZGUgdXAgdG8gcGxoX2x3Yg0KPj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4+IEkg
c2VlIG5vIHJlYXNvbiB3aHkgZXh0X3RyZWVfcHJlcGFyZV9jb21taXQoKSBzaG91bGRu4oCZdCBi
ZSBhbGxvd2VkIHRvIGV4dGVuZCB0aGUgYXJncy0+bGFzdGJ5dGV3cml0dGVuLiBUaGlzIGlzIGEg
bWV0YWRhdGEgb3BlcmF0aW9uIHRoYXQgaXMgb3duZWQgYnkgdGhlIHBORlMgbGF5b3V0IGRyaXZl
ci4NCj4+Pj4+Pj4+PiBUaGUgb25seSB0aGluZyBJ4oCZZCBub3RlIGlzIHlvdSBzaG91bGQgdGhl
biByZXdyaXRlIHRoZSBmYWlsdXJlIGNhc2UgaW4gcG5mc19sYXlvdXRjb21taXRfaW5vZGUoKSBz
byB0aGF0IGl0IGRvZXNu4oCZdCByZWx5IG9uIHRoZSBzYXZlZCDigJxlbmRfcG9z4oCdLCBidXQg
dXNlcyBhcmdzLT5sYXN0Ynl0ZXdyaXR0ZW4gaW5zdGVhZCAod2l0aCBhIGNvbW1lbnQgdG8gdGhl
IGVmZmVjdCB3aHkp4oCmDQo+Pj4+Pj4+PiANCj4+Pj4+Pj4+IEluIGZhY3QsIGdpdmVuIHRoZSBw
b3RlbnRpYWwgZm9yIHJhY2VzIGhlcmUsIEkgdGhpbmsgdGhlIHJpZ2h0IHRoaW5nIHRvIGRvIGlz
IHRvIGhhdmUgZXh0X3RyZWVfcHJlcGFyZV9jb21taXQoKSBhbHdheXMgc2V0IHRoZSBjb3JyZWN0
IHZhbHVlIGZvciBhcmdzLT5sYXN0Ynl0ZXdyaXR0ZW4uDQo+Pj4+Pj4+IA0KPj4+Pj4+PiBPSywg
dGhhdCBoYXMgY2xlYXJlZCB1cCB0aGF0IGNvbW1vbiBmYWlsdXJlIGNhc2UgdGhhdCB3YXMgZ2V0
dGluZyBpbiB0aGUNCj4+Pj4+Pj4gd2F5LCBidXQgbm93IGl0IGNhbiBzdGlsbCBmYWlsIGxpa2Ug
dGhpczoNCj4+Pj4+Pj4gDQo+Pj4+Pj4gDQo+Pj4+Pj4gR29vZCBwcm9ncmVzcyEgOi0pDQo+Pj4+
Pj4gDQo+Pj4+Pj4+IG5mc193cml0ZWJhY2tfdXBkYXRlX2lub2RlIHNldHMgc2l6ZSA0MDk2IHcv
IE5GU19JTk9fSU5WQUxJRF9BVFRSIHNldCwgYW5kIHNldHMgTkZTX0lOT19MQVlPVVRDT01NSVQN
Cj4+Pj4+Pj4gMXN0IG5mc19nZXRhdHRyIC0+IHBuZnNfbGF5b3V0Y29tbWl0X2lub2RlIHN0YXJ0
cywgY2xlYXJzIGxheW91dGNvbW1pdCBmbGFnIHNldHMgTkZTX0lOT19MQVlPVVRDT01NSVRJTkcN
Cj4+Pj4+Pj4gbmZzX3dyaXRlYmFja191cGRhdGVfaW5vZGUgc2V0cyBzaXplIDgxOTIgdy8gTkZT
X0lOT19JTlZBTElEX0FUVFIgc2V0LCBhbmQgc2V0cyBORlNfSU5PX0xBWU9VVENPTU1JVA0KPj4+
Pj4+PiAxc3QgbmZzX2dldGF0dHIgLT4gbmZzNF9sYXlvdXRjb21taXRfcmVsZWFzZSBzZXRzIHNp
emUgNDA5NiwgTkZTX0lOT19JTlZBTElEX0FUVFIgc2V0LCBjbGVhcnMgTkZTX0lOT19MQVlPVVRD
T01NSVRUSU5HDQo+Pj4+Pj4+IDFzdCBuZnNfZ2V0YXR0ciAtPiBfX3JldmFsaWRhdGVfaW5vZGUg
c2V0cyBzaXplIDQwOTYsIE5GU19JTk9fSU5WQUxJRF9BVFRSIG5vdCBzZXQuLiBjYWNoZSBpcyB2
YWxpZA0KPj4+Pj4+PiAybmQgbmZzX2dldGF0dHIgaW1tZWRpYXRlbHkgcmV0dXJucyA0MDk2IGV2
ZW4gdGhvdWdoIE5GU19JTk9fTEFZT1VUQ09NTUlUDQo+Pj4+Pj4gDQo+Pj4+Pj4gSXMgdGhpcyBi
ZWluZyB0ZXN0ZWQgb24gdG9wIG9mIHRoZSBjdXJyZW50IGxpbnV4LW5leHQvdGVzdGluZz8gTm9y
bWFsbHksIEnigJlkIGV4cGVjdCBodHRwOi8vZ2l0LmxpbnV4LW5mcy5vcmcvP3A9dHJvbmRteS9s
aW51eC1uZnMuZ2l0O2E9Y29tbWl0ZGlmZjtoPTEwYjdlOWFkNDQ4ODFmY2Q0NmFjMjRlYjczNzQz
NzdjNmU4OTYyZWQgdG8gY2F1c2UgMXN0IG5mc19nZXRhdHRyKCkgdG8gbm90IGRlY2xhcmUgdGhl
IGNhY2hlIHZhbGlkLg0KPj4+Pj4gDQo+Pj4+PiBZZXMsIHRoaXMgaXMgb24geW91ciBsaW51eC1u
ZXh0IGJyYW5jaC4NCj4+Pj4+IA0KPj4+Pj4gV2hlbiB0aGUgMXN0IG5mc19nZXRhdHRyKCkgZ29l
cyB0aHJvdWdoIG5mc191cGRhdGVfaW5vZGUoKSB0aGUgc2Vjb25kIHRpbWUNCj4+Pj4+IChkdXJp
bmcgX19yZXZhbGlkYXRlX2lub2RlKSwgTkZTX0lOT19JTlZBTElEX0FUVFIgaXMgbmV2ZXIgc2V0
IGJ5IGFueXRoaW5nLA0KPj4+Pj4gc2luY2UgYWxsIHRoZSBhdHRyaWJ1dGVzIHJldHVybmVkIG1h
dGNoIHRoZSBjYWNoZS4gIFNvIGV2ZW4gdGhvdWdoDQo+Pj4+PiBORlNfSU5PX0xBWU9VVENPTU1J
VCBpcyBzZXQsIGFuZCB0aGUgY2FjaGVfdmFsaWRpdHkgdmFyaWFibGUgaXMgImZhbHNlIiwNCj4+
Pj4+IHRoZSBORlNfSU5PX0lOVkFMSURfQVRUUiBpcyBuZXZlciBzZXQgaW4gdGhlICJpbnZhbGlk
IiBsb2NhbCB2YXJpYWJsZS4NCj4+Pj4+IA0KPj4+Pj4gU2hvdWxkIHBuZnNfbGF5b3V0Y29tbWl0
X291dHN0YW5kaW5nKCkgYWx3YXlzIHNldCBORlNfSU5PX0lOVkFMSURfQVRUUj8NCj4+Pj4+IA0K
Pj4+Pj4gQmVuDQo+Pj4+IA0KPj4+PiBuZnNfcG9zdF9vcF91cGRhdGVfaW5vZGVfbG9ja2VkKCkg
c2hvdWxkIGJlIGRvaW5nIHRoYXQgYXMgcGFydCBvZiB0aGUgY2FsbGNoYWluIGluIG5mc193cml0
ZWJhY2tfdXBkYXRlX2lub2RlKCkuDQo+Pj4gDQo+Pj4gQW5kIGl0IGlzLCBhbmQgdGhlIGJpdCBw
ZXJzaXN0cyB0aHJvdWdoIHRoZSBuZXh0IGxheW91dGNvbW1pdCwgaXQgaXMgdGhlIG5leHQgR0VU
QVRUUiByZXNwb25zZSB0aGF0IGZpbmRzIHRoYXQgYWxsIHRoZSBhdHRyaWJ1dGVzIGFyZSB0aGUg
c2FtZSBhbmQgdGhlIGJpdCBpcyBjbGVhcmVkLg0KPj4+IA0KPj4gDQo+PiBTbyB3aGF0IGlmIHdl
IHJlcXVpcmUgdGhhdCBuZnNpLT5jYWNoZV92YWxpZGl0eSBiZSBzZXQgdG8gc2F2ZV9jYWNoZV92
YWxpZGl0eSAmIE5GU19JTk9fSU5WQUxJRF9BVFRSIGF0IGEgbWluaW11bSBpZiBwbmZzX2xheW91
dGNvbW1pdF9vdXRzdGFuZGluZygpPw0KPiANCj4gV2l0aCB0aGlzLCBJIGFtIHVuYWJsZSB0byBy
ZXByb2R1Y2UgdGhlIHByb2JsZW06DQo+IA0KPiBAQCAtMTY2NSw3ICsxNjg0LDcgQEAgc3RhdGlj
IGludCBuZnNfdXBkYXRlX2lub2RlKHN0cnVjdCBpbm9kZSAqaW5vZGUsIHN0cnVjdCBuZnNfZmF0
dHIgKmZhdHRyKQ0KPiAgICAgICAgdW5zaWduZWQgbG9uZyBub3cgPSBqaWZmaWVzOw0KPiAgICAg
ICAgdW5zaWduZWQgbG9uZyBzYXZlX2NhY2hlX3ZhbGlkaXR5Ow0KPiAgICAgICAgYm9vbCBoYXZl
X3dyaXRlcnMgPSBuZnNfZmlsZV9oYXNfYnVmZmVyZWRfd3JpdGVycyhuZnNpKTsNCj4gLSAgICAg
ICBib29sIGNhY2hlX3JldmFsaWRhdGVkOw0KPiArICAgICAgIGJvb2wgY2FjaGVfcmV2YWxpZGF0
ZWQgPSB0cnVlOw0KPiANCj4gICAgICAgIGRmcHJpbnRrKFZGUywgIk5GUzogJXMoJXMvJWx1IGZo
X2NyYz0weCUwOHggY3Q9JWQgaW5mbz0weCV4KVxuIiwNCj4gICAgICAgICAgICAgICAgICAgICAg
ICBfX2Z1bmNfXywgaW5vZGUtPmlfc2ItPnNfaWQsIGlub2RlLT5pX2lubywNCj4gQEAgLTE3MTQs
OCArMTczMywxMCBAQCBzdGF0aWMgaW50IG5mc191cGRhdGVfaW5vZGUoc3RydWN0IGlub2RlICpp
bm9kZSwgc3RydWN0IG5mc19mYXR0ciAqZmF0dHIpDQo+ICAgICAgICAvKiBEbyBhdG9taWMgd2Vh
ayBjYWNoZSBjb25zaXN0ZW5jeSB1cGRhdGVzICovDQo+ICAgICAgICBpbnZhbGlkIHw9IG5mc193
Y2NfdXBkYXRlX2lub2RlKGlub2RlLCBmYXR0cik7DQo+IA0KPiAtDQo+IC0gICAgICAgY2FjaGVf
cmV2YWxpZGF0ZWQgPSAhcG5mc19sYXlvdXRjb21taXRfb3V0c3RhbmRpbmcoaW5vZGUpOw0KPiAr
ICAgICAgIGlmIChwbmZzX2xheW91dGNvbW1pdF9vdXRzdGFuZGluZyhpbm9kZSkpIHsNCj4gKyAg
ICAgICAgICAgICAgIG5mc2ktPmNhY2hlX3ZhbGlkaXR5IHw9IHNhdmVfY2FjaGVfdmFsaWRpdHkg
JiBORlNfSU5PX0lOVkFMSURfQVRUUjsNCj4gKyAgICAgICAgICAgICAgIGNhY2hlX3JldmFsaWRh
dGVkID0gZmFsc2U7DQo+ICsgICAgICAgfQ0KPiANCj4gSSdsbCBzZW5kIHRoZXNlIHR3byBwYXRj
aGVzIGFsb25nIHNob3J0bHkgdW5sZXNzIG90aGVyd2lzZSBjYWxsZWQgb2ZmLi4NCg0KVGhhdCBs
b29rcyBqdXN0IGZpbmUuIFRoYW5rcyBmb3IgeW91ciB3b3JrIG9uIHRoaXMhDQoNCg0K


^ permalink raw reply	[flat|nested] 69+ messages in thread

end of thread, other threads:[~2016-07-28 16:41 UTC | newest]

Thread overview: 69+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-06 22:29 [PATCH v4 00/28] NFS writeback performance patches for v4.8 Trond Myklebust
2016-07-06 22:29 ` [PATCH v4 01/28] NFS: Don't flush caches for a getattr that races with writeback Trond Myklebust
2016-07-06 22:29   ` [PATCH v4 02/28] NFS: Cache access checks more aggressively Trond Myklebust
2016-07-06 22:29     ` [PATCH v4 03/28] NFS: Cache aggressively when file is open for writing Trond Myklebust
2016-07-06 22:29       ` [PATCH v4 04/28] NFS: Kill NFS_INO_NFS_INO_FLUSHING: it is a performance killer Trond Myklebust
2016-07-06 22:29         ` [PATCH v4 05/28] NFS: writepage of a single page should not be synchronous Trond Myklebust
2016-07-06 22:29           ` [PATCH v4 06/28] NFS: Don't hold the inode lock across fsync() Trond Myklebust
2016-07-06 22:29             ` [PATCH v4 07/28] NFS: Don't call COMMIT in ->releasepage() Trond Myklebust
2016-07-06 22:29               ` [PATCH v4 08/28] pNFS/files: Fix layoutcommit after a commit to DS Trond Myklebust
2016-07-06 22:29                 ` [PATCH v4 09/28] pNFS/flexfiles: " Trond Myklebust
2016-07-06 22:29                   ` [PATCH v4 10/28] pNFS/flexfiles: Clean up calls to pnfs_set_layoutcommit() Trond Myklebust
2016-07-06 22:29                     ` [PATCH v4 11/28] pNFS: Files and flexfiles always need to commit before layoutcommit Trond Myklebust
2016-07-06 22:29                       ` [PATCH v4 12/28] pNFS: Ensure we layoutcommit before revalidating attributes Trond Myklebust
2016-07-06 22:29                         ` [PATCH v4 13/28] pNFS: pnfs_layoutcommit_outstanding() is no longer used when !CONFIG_NFS_V4_1 Trond Myklebust
2016-07-06 22:29                           ` [PATCH v4 14/28] NFS: Fix O_DIRECT verifier problems Trond Myklebust
2016-07-06 22:29                             ` [PATCH v4 15/28] NFS: Ensure we reset the write verifier 'committed' value on resend Trond Myklebust
2016-07-06 22:29                               ` [PATCH v4 16/28] NFS: Remove racy size manipulations in O_DIRECT Trond Myklebust
2016-07-06 22:29                                 ` [PATCH v4 17/28] NFS Cleanup: move call to generic_write_checks() into fs/nfs/direct.c Trond Myklebust
2016-07-06 22:29                                   ` [PATCH v4 18/28] NFS: Move buffered I/O locking into nfs_file_write() Trond Myklebust
2016-07-06 22:29                                     ` [PATCH v4 19/28] NFS: Do not serialise O_DIRECT reads and writes Trond Myklebust
2016-07-06 22:29                                       ` [PATCH v4 20/28] NFS: Cleanup nfs_direct_complete() Trond Myklebust
2016-07-06 22:29                                         ` [PATCH v4 21/28] NFS: Remove redundant waits for O_DIRECT in fsync() and write_begin() Trond Myklebust
2016-07-06 22:29                                           ` [PATCH v4 22/28] NFS: Remove unused function nfs_revalidate_mapping_protected() Trond Myklebust
2016-07-06 22:30                                             ` [PATCH v4 23/28] NFS: Do not aggressively cache file attributes in the case of O_DIRECT Trond Myklebust
2016-07-06 22:30                                               ` [PATCH v4 24/28] NFS: Getattr doesn't require data sync semantics Trond Myklebust
2016-07-06 22:30                                                 ` [PATCH v4 25/28] NFSv4.2: Fix a race in nfs42_proc_deallocate() Trond Myklebust
2016-07-06 22:30                                                   ` [PATCH v4 26/28] NFSv4.2: Fix writeback races in nfs4_copy_file_range Trond Myklebust
2016-07-06 22:30                                                     ` [PATCH v4 27/28] NFSv4.2: llseek(SEEK_HOLE) and llseek(SEEK_DATA) don't require data sync Trond Myklebust
2016-07-06 22:30                                                       ` [PATCH v4 28/28] NFS nfs_vm_page_mkwrite: Don't freeze me, Bro Trond Myklebust
2016-07-18  3:48                                                 ` [PATCH v4 24/28] NFS: Getattr doesn't require data sync semantics Christoph Hellwig
2016-07-18  4:32                                                   ` Trond Myklebust
2016-07-18  4:59                                                     ` Trond Myklebust
2016-07-19  3:58                                                       ` hch
2016-07-19 20:00                                                         ` [PATCH v4 24/28] " Benjamin Coddington
2016-07-19 20:06                                                           ` Trond Myklebust
2016-07-20 15:03                                                             ` Benjamin Coddington
2016-07-21  8:22                                                               ` hch
2016-07-21  8:32                                                                 ` Benjamin Coddington
2016-07-21  9:10                                                                   ` Benjamin Coddington
2016-07-21  9:52                                                                     ` Benjamin Coddington
2016-07-21 12:46                                                                       ` Trond Myklebust
2016-07-21 13:05                                                                         ` Benjamin Coddington
2016-07-21 13:20                                                                           ` Trond Myklebust
2016-07-21 14:00                                                                             ` Trond Myklebust
2016-07-21 14:02                                                                             ` Benjamin Coddington
2016-07-25 16:26                                                                             ` Benjamin Coddington
2016-07-25 16:39                                                                               ` Trond Myklebust
2016-07-25 18:26                                                                                 ` Benjamin Coddington
2016-07-25 18:34                                                                                   ` Trond Myklebust
2016-07-25 18:41                                                                                     ` Benjamin Coddington
2016-07-26 16:32                                                                                       ` Benjamin Coddington
2016-07-26 16:35                                                                                         ` Trond Myklebust
2016-07-26 17:57                                                                                           ` Benjamin Coddington
2016-07-26 18:07                                                                                             ` Trond Myklebust
2016-07-27 11:55                                                                                               ` Benjamin Coddington
2016-07-27 12:15                                                                                                 ` Trond Myklebust
2016-07-27 12:31                                                                                                   ` Trond Myklebust
2016-07-27 16:14                                                                                                     ` Benjamin Coddington
2016-07-27 18:05                                                                                                       ` Trond Myklebust
2016-07-28  9:47                                                                                                         ` Benjamin Coddington
2016-07-28 12:31                                                                                                           ` Trond Myklebust
2016-07-28 14:04                                                                                                             ` Trond Myklebust
2016-07-28 15:38                                                                                                               ` Benjamin Coddington
2016-07-28 15:39                                                                                                                 ` Trond Myklebust
2016-07-28 15:33                                                                                                             ` Benjamin Coddington
2016-07-28 15:36                                                                                                               ` Trond Myklebust
2016-07-28 16:40                                                                                                                 ` Benjamin Coddington
2016-07-28 16:41                                                                                                                   ` Trond Myklebust
2016-07-19 20:09                                                           ` Benjamin Coddington

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.