* [PATCH v4 00/28] NFS writeback performance patches for v4.8 @ 2016-07-06 22:29 Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 01/28] NFS: Don't flush caches for a getattr that races with writeback Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw) To: linux-nfs These patches are about improving the NFS I/O. 1) a new locking scheme between O_DIRECT and buffered I/O in order to allow parallelisation of O_DIRECT. 2) Remove instances of inode_lock that are preventing parallelism of writes and attribute revalidation. 3) Remove a redundant write throttling scheme that was conspiring with the FLUSH_COND_STABLE mode to drive writeback performance down. 4) Cache attributes more aggressively when we can assume close to open cache consistency. 5) Don't force data sync to disk where it is not needed. If attributes need to be up to date, we can usually get by with unstable writes to the server, particularly when we're using NFSv4 and can rely on stateids to cause operations to fail on a server reboot. 6) Fix a number of bugs around pNFS and attributes. Some (but not all) pNFS dialects may need data sync + layoutcommit in order to ensure that data and attribute updates are visible. In particular fix one bug was causing file size changes to be clobbered when layoutcommit was pending. Trond Myklebust (28): NFS: Don't flush caches for a getattr that races with writeback NFS: Cache access checks more aggressively NFS: Cache aggressively when file is open for writing NFS: Kill NFS_INO_NFS_INO_FLUSHING: it is a performance killer NFS: writepage of a single page should not be synchronous NFS: Don't hold the inode lock across fsync() NFS: Don't call COMMIT in ->releasepage() pNFS/files: Fix layoutcommit after a commit to DS pNFS/flexfiles: Fix layoutcommit after a commit to DS pNFS/flexfiles: Clean up calls to pnfs_set_layoutcommit() pNFS: Files and flexfiles always need to commit before layoutcommit pNFS: Ensure we layoutcommit before revalidating attributes pNFS: pnfs_layoutcommit_outstanding() is no longer used when !CONFIG_NFS_V4_1 NFS: Fix O_DIRECT verifier problems NFS: Ensure we reset the write verifier 'committed' value on resend. NFS: Remove racy size manipulations in O_DIRECT NFS Cleanup: move call to generic_write_checks() into fs/nfs/direct.c NFS: Move buffered I/O locking into nfs_file_write() NFS: Do not serialise O_DIRECT reads and writes NFS: Cleanup nfs_direct_complete() NFS: Remove redundant waits for O_DIRECT in fsync() and write_begin() NFS: Remove unused function nfs_revalidate_mapping_protected() NFS: Do not aggressively cache file attributes in the case of O_DIRECT NFS: Getattr doesn't require data sync semantics NFSv4.2: Fix a race in nfs42_proc_deallocate() NFSv4.2: Fix writeback races in nfs4_copy_file_range NFSv4.2: llseek(SEEK_HOLE) and llseek(SEEK_DATA) don't require data sync NFS nfs_vm_page_mkwrite: Don't freeze me, Bro... fs/nfs/Makefile | 2 +- fs/nfs/dir.c | 52 +++++++----- fs/nfs/direct.c | 93 +++++++-------------- fs/nfs/file.c | 96 ++++++--------------- fs/nfs/filelayout/filelayout.c | 12 +-- fs/nfs/flexfilelayout/flexfilelayout.c | 23 +++--- fs/nfs/inode.c | 133 ++++++++++++++--------------- fs/nfs/internal.h | 40 +++++++++ fs/nfs/io.c | 147 +++++++++++++++++++++++++++++++++ fs/nfs/nfs42proc.c | 21 ++++- fs/nfs/nfs4file.c | 14 +--- fs/nfs/nfs4xdr.c | 11 ++- fs/nfs/nfstrace.h | 1 - fs/nfs/pnfs.c | 5 +- fs/nfs/pnfs.h | 7 -- fs/nfs/pnfs_nfs.c | 7 ++ fs/nfs/write.c | 33 +++++--- include/linux/nfs_fs.h | 3 +- 18 files changed, 420 insertions(+), 280 deletions(-) create mode 100644 fs/nfs/io.c -- 2.7.4 ^ permalink raw reply [flat|nested] 69+ messages in thread
* [PATCH v4 01/28] NFS: Don't flush caches for a getattr that races with writeback 2016-07-06 22:29 [PATCH v4 00/28] NFS writeback performance patches for v4.8 Trond Myklebust @ 2016-07-06 22:29 ` Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 02/28] NFS: Cache access checks more aggressively Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw) To: linux-nfs If there were outstanding writes then chalk up the unexpected change attribute on the server to them. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/inode.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index 52e7d6869e3b..60051e62d3f1 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -1729,12 +1729,15 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr) if (inode->i_version != fattr->change_attr) { dprintk("NFS: change_attr change on server for file %s/%ld\n", inode->i_sb->s_id, inode->i_ino); - invalid |= NFS_INO_INVALID_ATTR - | NFS_INO_INVALID_DATA - | NFS_INO_INVALID_ACCESS - | NFS_INO_INVALID_ACL; - if (S_ISDIR(inode->i_mode)) - nfs_force_lookup_revalidate(inode); + /* Could it be a race with writeback? */ + if (nfsi->nrequests == 0) { + invalid |= NFS_INO_INVALID_ATTR + | NFS_INO_INVALID_DATA + | NFS_INO_INVALID_ACCESS + | NFS_INO_INVALID_ACL; + if (S_ISDIR(inode->i_mode)) + nfs_force_lookup_revalidate(inode); + } inode->i_version = fattr->change_attr; } } else { -- 2.7.4 ^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v4 02/28] NFS: Cache access checks more aggressively 2016-07-06 22:29 ` [PATCH v4 01/28] NFS: Don't flush caches for a getattr that races with writeback Trond Myklebust @ 2016-07-06 22:29 ` Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 03/28] NFS: Cache aggressively when file is open for writing Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw) To: linux-nfs If an attribute revalidation fails, then we already know that we'll zap the access cache. If, OTOH, the inode isn't changing, there should be no need to eject access calls just because they are old. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/dir.c | 52 +++++++++++++++++++++++++++++++--------------------- 1 file changed, 31 insertions(+), 21 deletions(-) diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index aaf7bd0cbae2..210b33636fe4 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -2228,21 +2228,37 @@ static struct nfs_access_entry *nfs_access_search_rbtree(struct inode *inode, st return NULL; } -static int nfs_access_get_cached(struct inode *inode, struct rpc_cred *cred, struct nfs_access_entry *res) +static int nfs_access_get_cached(struct inode *inode, struct rpc_cred *cred, struct nfs_access_entry *res, bool may_block) { struct nfs_inode *nfsi = NFS_I(inode); struct nfs_access_entry *cache; - int err = -ENOENT; + bool retry = true; + int err; spin_lock(&inode->i_lock); - if (nfsi->cache_validity & NFS_INO_INVALID_ACCESS) - goto out_zap; - cache = nfs_access_search_rbtree(inode, cred); - if (cache == NULL) - goto out; - if (!nfs_have_delegated_attributes(inode) && - !time_in_range_open(jiffies, cache->jiffies, cache->jiffies + nfsi->attrtimeo)) - goto out_stale; + for(;;) { + if (nfsi->cache_validity & NFS_INO_INVALID_ACCESS) + goto out_zap; + cache = nfs_access_search_rbtree(inode, cred); + err = -ENOENT; + if (cache == NULL) + goto out; + /* Found an entry, is our attribute cache valid? */ + if (!nfs_attribute_cache_expired(inode) && + !(nfsi->cache_validity & NFS_INO_INVALID_ATTR)) + break; + err = -ECHILD; + if (!may_block) + goto out; + if (!retry) + goto out_zap; + spin_unlock(&inode->i_lock); + err = __nfs_revalidate_inode(NFS_SERVER(inode), inode); + if (err) + return err; + spin_lock(&inode->i_lock); + retry = false; + } res->jiffies = cache->jiffies; res->cred = cache->cred; res->mask = cache->mask; @@ -2251,12 +2267,6 @@ static int nfs_access_get_cached(struct inode *inode, struct rpc_cred *cred, str out: spin_unlock(&inode->i_lock); return err; -out_stale: - rb_erase(&cache->rb_node, &nfsi->access_cache); - list_del(&cache->lru); - spin_unlock(&inode->i_lock); - nfs_access_free_entry(cache); - return -ENOENT; out_zap: spin_unlock(&inode->i_lock); nfs_access_zap_cache(inode); @@ -2283,13 +2293,12 @@ static int nfs_access_get_cached_rcu(struct inode *inode, struct rpc_cred *cred, cache = NULL; if (cache == NULL) goto out; - if (!nfs_have_delegated_attributes(inode) && - !time_in_range_open(jiffies, cache->jiffies, cache->jiffies + nfsi->attrtimeo)) + err = nfs_revalidate_inode_rcu(NFS_SERVER(inode), inode); + if (err) goto out; res->jiffies = cache->jiffies; res->cred = cache->cred; res->mask = cache->mask; - err = 0; out: rcu_read_unlock(); return err; @@ -2378,18 +2387,19 @@ EXPORT_SYMBOL_GPL(nfs_access_set_mask); static int nfs_do_access(struct inode *inode, struct rpc_cred *cred, int mask) { struct nfs_access_entry cache; + bool may_block = (mask & MAY_NOT_BLOCK) == 0; int status; trace_nfs_access_enter(inode); status = nfs_access_get_cached_rcu(inode, cred, &cache); if (status != 0) - status = nfs_access_get_cached(inode, cred, &cache); + status = nfs_access_get_cached(inode, cred, &cache, may_block); if (status == 0) goto out_cached; status = -ECHILD; - if (mask & MAY_NOT_BLOCK) + if (!may_block) goto out; /* Be clever: ask server to check for all possible rights */ -- 2.7.4 ^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v4 03/28] NFS: Cache aggressively when file is open for writing 2016-07-06 22:29 ` [PATCH v4 02/28] NFS: Cache access checks more aggressively Trond Myklebust @ 2016-07-06 22:29 ` Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 04/28] NFS: Kill NFS_INO_NFS_INO_FLUSHING: it is a performance killer Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw) To: linux-nfs Unless the user is using file locking, we must assume close-to-open cache consistency when the file is open for writing. Adjust the caching algorithm so that it does not clear the cache on out-of-order writes and/or attribute revalidations. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/file.c | 13 ++---------- fs/nfs/inode.c | 62 +++++++++++++++++++++++++++++++++++++++++----------------- 2 files changed, 46 insertions(+), 29 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 717a8d6af52d..2d39d9f9da7d 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -780,11 +780,6 @@ do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local) } static int -is_time_granular(struct timespec *ts) { - return ((ts->tv_sec == 0) && (ts->tv_nsec <= 1000)); -} - -static int do_setlk(struct file *filp, int cmd, struct file_lock *fl, int is_local) { struct inode *inode = filp->f_mapping->host; @@ -817,12 +812,8 @@ do_setlk(struct file *filp, int cmd, struct file_lock *fl, int is_local) * This makes locking act as a cache coherency point. */ nfs_sync_mapping(filp->f_mapping); - if (!NFS_PROTO(inode)->have_delegation(inode, FMODE_READ)) { - if (is_time_granular(&NFS_SERVER(inode)->time_delta)) - __nfs_revalidate_inode(NFS_SERVER(inode), inode); - else - nfs_zap_caches(inode); - } + if (!NFS_PROTO(inode)->have_delegation(inode, FMODE_READ)) + nfs_zap_mapping(inode, filp->f_mapping); out: return status; } diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index 60051e62d3f1..4e65a5a8a01b 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -878,7 +878,10 @@ void nfs_inode_attach_open_context(struct nfs_open_context *ctx) struct nfs_inode *nfsi = NFS_I(inode); spin_lock(&inode->i_lock); - list_add(&ctx->list, &nfsi->open_files); + if (ctx->mode & FMODE_WRITE) + list_add(&ctx->list, &nfsi->open_files); + else + list_add_tail(&ctx->list, &nfsi->open_files); spin_unlock(&inode->i_lock); } EXPORT_SYMBOL_GPL(nfs_inode_attach_open_context); @@ -1215,6 +1218,25 @@ int nfs_revalidate_mapping_protected(struct inode *inode, struct address_space * return __nfs_revalidate_mapping(inode, mapping, true); } +static bool nfs_file_has_writers(struct nfs_inode *nfsi) +{ + struct inode *inode = &nfsi->vfs_inode; + + assert_spin_locked(&inode->i_lock); + + if (!S_ISREG(inode->i_mode)) + return false; + if (list_empty(&nfsi->open_files)) + return false; + /* Note: This relies on nfsi->open_files being ordered with writers + * being placed at the head of the list. + * See nfs_inode_attach_open_context() + */ + return (list_first_entry(&nfsi->open_files, + struct nfs_open_context, + list)->mode & FMODE_WRITE) == FMODE_WRITE; +} + static unsigned long nfs_wcc_update_inode(struct inode *inode, struct nfs_fattr *fattr) { struct nfs_inode *nfsi = NFS_I(inode); @@ -1279,22 +1301,24 @@ static int nfs_check_inode_attributes(struct inode *inode, struct nfs_fattr *fat if ((fattr->valid & NFS_ATTR_FATTR_TYPE) && (inode->i_mode & S_IFMT) != (fattr->mode & S_IFMT)) return -EIO; - if ((fattr->valid & NFS_ATTR_FATTR_CHANGE) != 0 && - inode->i_version != fattr->change_attr) - invalid |= NFS_INO_INVALID_ATTR|NFS_INO_REVAL_PAGECACHE; + if (!nfs_file_has_writers(nfsi)) { + /* Verify a few of the more important attributes */ + if ((fattr->valid & NFS_ATTR_FATTR_CHANGE) != 0 && inode->i_version != fattr->change_attr) + invalid |= NFS_INO_INVALID_ATTR | NFS_INO_REVAL_PAGECACHE; - /* Verify a few of the more important attributes */ - if ((fattr->valid & NFS_ATTR_FATTR_MTIME) && !timespec_equal(&inode->i_mtime, &fattr->mtime)) - invalid |= NFS_INO_INVALID_ATTR; + if ((fattr->valid & NFS_ATTR_FATTR_MTIME) && !timespec_equal(&inode->i_mtime, &fattr->mtime)) + invalid |= NFS_INO_INVALID_ATTR; - if (fattr->valid & NFS_ATTR_FATTR_SIZE) { - cur_size = i_size_read(inode); - new_isize = nfs_size_to_loff_t(fattr->size); - if (cur_size != new_isize) - invalid |= NFS_INO_INVALID_ATTR|NFS_INO_REVAL_PAGECACHE; + if ((fattr->valid & NFS_ATTR_FATTR_CTIME) && !timespec_equal(&inode->i_ctime, &fattr->ctime)) + invalid |= NFS_INO_INVALID_ATTR; + + if (fattr->valid & NFS_ATTR_FATTR_SIZE) { + cur_size = i_size_read(inode); + new_isize = nfs_size_to_loff_t(fattr->size); + if (cur_size != new_isize) + invalid |= NFS_INO_INVALID_ATTR|NFS_INO_REVAL_PAGECACHE; + } } - if (nfsi->nrequests != 0) - invalid &= ~NFS_INO_REVAL_PAGECACHE; /* Have any file permissions changed? */ if ((fattr->valid & NFS_ATTR_FATTR_MODE) && (inode->i_mode & S_IALLUGO) != (fattr->mode & S_IALLUGO)) @@ -1526,7 +1550,7 @@ EXPORT_SYMBOL_GPL(nfs_refresh_inode); static int nfs_post_op_update_inode_locked(struct inode *inode, struct nfs_fattr *fattr) { - unsigned long invalid = NFS_INO_INVALID_ATTR|NFS_INO_REVAL_PAGECACHE; + unsigned long invalid = NFS_INO_INVALID_ATTR; /* * Don't revalidate the pagecache if we hold a delegation, but do @@ -1675,6 +1699,7 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr) unsigned long invalid = 0; unsigned long now = jiffies; unsigned long save_cache_validity; + bool have_writers = nfs_file_has_writers(nfsi); bool cache_revalidated = true; dfprintk(VFS, "NFS: %s(%s/%lu fh_crc=0x%08x ct=%d info=0x%x)\n", @@ -1730,7 +1755,7 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr) dprintk("NFS: change_attr change on server for file %s/%ld\n", inode->i_sb->s_id, inode->i_ino); /* Could it be a race with writeback? */ - if (nfsi->nrequests == 0) { + if (!have_writers) { invalid |= NFS_INO_INVALID_ATTR | NFS_INO_INVALID_DATA | NFS_INO_INVALID_ACCESS @@ -1770,9 +1795,10 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr) if (new_isize != cur_isize) { /* Do we perhaps have any outstanding writes, or has * the file grown beyond our last write? */ - if ((nfsi->nrequests == 0) || new_isize > cur_isize) { + if (nfsi->nrequests == 0 || new_isize > cur_isize) { i_size_write(inode, new_isize); - invalid |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA; + if (!have_writers) + invalid |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA; } dprintk("NFS: isize change on server for file %s/%ld " "(%Ld to %Ld)\n", -- 2.7.4 ^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v4 04/28] NFS: Kill NFS_INO_NFS_INO_FLUSHING: it is a performance killer 2016-07-06 22:29 ` [PATCH v4 03/28] NFS: Cache aggressively when file is open for writing Trond Myklebust @ 2016-07-06 22:29 ` Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 05/28] NFS: writepage of a single page should not be synchronous Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw) To: linux-nfs filemap_datawrite() and friends already deal just fine with livelock. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/file.c | 8 -------- fs/nfs/nfstrace.h | 1 - fs/nfs/write.c | 11 ----------- include/linux/nfs_fs.h | 1 - 4 files changed, 21 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 2d39d9f9da7d..29d7477a62e8 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -360,14 +360,6 @@ static int nfs_write_begin(struct file *file, struct address_space *mapping, start: /* - * Prevent starvation issues if someone is doing a consistency - * sync-to-disk - */ - ret = wait_on_bit_action(&NFS_I(mapping->host)->flags, NFS_INO_FLUSHING, - nfs_wait_bit_killable, TASK_KILLABLE); - if (ret) - return ret; - /* * Wait for O_DIRECT to complete */ inode_dio_wait(mapping->host); diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h index 0b9e5cc9a747..fe80a1c26340 100644 --- a/fs/nfs/nfstrace.h +++ b/fs/nfs/nfstrace.h @@ -37,7 +37,6 @@ { 1 << NFS_INO_ADVISE_RDPLUS, "ADVISE_RDPLUS" }, \ { 1 << NFS_INO_STALE, "STALE" }, \ { 1 << NFS_INO_INVALIDATING, "INVALIDATING" }, \ - { 1 << NFS_INO_FLUSHING, "FLUSHING" }, \ { 1 << NFS_INO_FSCACHE, "FSCACHE" }, \ { 1 << NFS_INO_LAYOUTCOMMIT, "NEED_LAYOUTCOMMIT" }, \ { 1 << NFS_INO_LAYOUTCOMMITTING, "LAYOUTCOMMIT" }) diff --git a/fs/nfs/write.c b/fs/nfs/write.c index e1c74d3db64d..980d44f3a84c 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -657,16 +657,9 @@ static int nfs_writepages_callback(struct page *page, struct writeback_control * int nfs_writepages(struct address_space *mapping, struct writeback_control *wbc) { struct inode *inode = mapping->host; - unsigned long *bitlock = &NFS_I(inode)->flags; struct nfs_pageio_descriptor pgio; int err; - /* Stop dirtying of new pages while we sync */ - err = wait_on_bit_lock_action(bitlock, NFS_INO_FLUSHING, - nfs_wait_bit_killable, TASK_KILLABLE); - if (err) - goto out_err; - nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGES); nfs_pageio_init_write(&pgio, inode, wb_priority(wbc), false, @@ -674,10 +667,6 @@ int nfs_writepages(struct address_space *mapping, struct writeback_control *wbc) err = write_cache_pages(mapping, wbc, nfs_writepages_callback, &pgio); nfs_pageio_complete(&pgio); - clear_bit_unlock(NFS_INO_FLUSHING, bitlock); - smp_mb__after_atomic(); - wake_up_bit(bitlock, NFS_INO_FLUSHING); - if (err < 0) goto out_err; err = pgio.pg_error; diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index d71278c3c5bd..120dd04b553c 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -205,7 +205,6 @@ struct nfs_inode { #define NFS_INO_STALE (1) /* possible stale inode */ #define NFS_INO_ACL_LRU_SET (2) /* Inode is on the LRU list */ #define NFS_INO_INVALIDATING (3) /* inode is being invalidated */ -#define NFS_INO_FLUSHING (4) /* inode is flushing out data */ #define NFS_INO_FSCACHE (5) /* inode can be cached by FS-Cache */ #define NFS_INO_FSCACHE_LOCK (6) /* FS-Cache cookie management lock */ #define NFS_INO_LAYOUTCOMMIT (9) /* layoutcommit required */ -- 2.7.4 ^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v4 05/28] NFS: writepage of a single page should not be synchronous 2016-07-06 22:29 ` [PATCH v4 04/28] NFS: Kill NFS_INO_NFS_INO_FLUSHING: it is a performance killer Trond Myklebust @ 2016-07-06 22:29 ` Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 06/28] NFS: Don't hold the inode lock across fsync() Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw) To: linux-nfs It is almost always better to wait for more so that we can issue a bulk commit. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/write.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfs/write.c b/fs/nfs/write.c index 980d44f3a84c..b13d48881d3a 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -625,7 +625,7 @@ static int nfs_writepage_locked(struct page *page, int err; nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGE); - nfs_pageio_init_write(&pgio, inode, wb_priority(wbc), + nfs_pageio_init_write(&pgio, inode, 0, false, &nfs_async_write_completion_ops); err = nfs_do_writepage(page, wbc, &pgio, launder); nfs_pageio_complete(&pgio); -- 2.7.4 ^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v4 06/28] NFS: Don't hold the inode lock across fsync() 2016-07-06 22:29 ` [PATCH v4 05/28] NFS: writepage of a single page should not be synchronous Trond Myklebust @ 2016-07-06 22:29 ` Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 07/28] NFS: Don't call COMMIT in ->releasepage() Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw) To: linux-nfs Commits are no longer required to be serialised. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/file.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 29d7477a62e8..249262b6bcbe 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -277,11 +277,9 @@ nfs_file_fsync(struct file *file, loff_t start, loff_t end, int datasync) ret = filemap_write_and_wait_range(inode->i_mapping, start, end); if (ret != 0) break; - inode_lock(inode); ret = nfs_file_fsync_commit(file, start, end, datasync); if (!ret) ret = pnfs_sync_inode(inode, !!datasync); - inode_unlock(inode); /* * If nfs_file_fsync_commit detected a server reboot, then * resend all dirty pages that might have been covered by -- 2.7.4 ^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v4 07/28] NFS: Don't call COMMIT in ->releasepage() 2016-07-06 22:29 ` [PATCH v4 06/28] NFS: Don't hold the inode lock across fsync() Trond Myklebust @ 2016-07-06 22:29 ` Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 08/28] pNFS/files: Fix layoutcommit after a commit to DS Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw) To: linux-nfs While COMMIT has the potential to free up a lot of memory that is being taken by unstable writes, it isn't guaranteed to free up this particular page. Also, calling fsync() on the server is expensive and so we want to do it in a more controlled fashion, rather than have it triggered at random by the VM. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/file.c | 23 ----------------------- 1 file changed, 23 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 249262b6bcbe..df4dd8e7e62e 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -460,31 +460,8 @@ static void nfs_invalidate_page(struct page *page, unsigned int offset, */ static int nfs_release_page(struct page *page, gfp_t gfp) { - struct address_space *mapping = page->mapping; - dfprintk(PAGECACHE, "NFS: release_page(%p)\n", page); - /* Always try to initiate a 'commit' if relevant, but only - * wait for it if the caller allows blocking. Even then, - * only wait 1 second and only if the 'bdi' is not congested. - * Waiting indefinitely can cause deadlocks when the NFS - * server is on this machine, when a new TCP connection is - * needed and in other rare cases. There is no particular - * need to wait extensively here. A short wait has the - * benefit that someone else can worry about the freezer. - */ - if (mapping) { - struct nfs_server *nfss = NFS_SERVER(mapping->host); - nfs_commit_inode(mapping->host, 0); - if (gfpflags_allow_blocking(gfp) && - !bdi_write_congested(&nfss->backing_dev_info)) { - wait_on_page_bit_killable_timeout(page, PG_private, - HZ); - if (PagePrivate(page)) - set_bdi_congested(&nfss->backing_dev_info, - BLK_RW_ASYNC); - } - } /* If PagePrivate() is set, then the page is not freeable */ if (PagePrivate(page)) return 0; -- 2.7.4 ^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v4 08/28] pNFS/files: Fix layoutcommit after a commit to DS 2016-07-06 22:29 ` [PATCH v4 07/28] NFS: Don't call COMMIT in ->releasepage() Trond Myklebust @ 2016-07-06 22:29 ` Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 09/28] pNFS/flexfiles: " Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw) To: linux-nfs According to the errata https://www.rfc-editor.org/errata_search.php?rfc=5661&eid=2751 we should always send layout commit after a commit to DS. Fixes: bc7d4b8fd091 ("nfs/filelayout: set layoutcommit...") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/filelayout/filelayout.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c index aa59757389dc..b4c1407e8fe4 100644 --- a/fs/nfs/filelayout/filelayout.c +++ b/fs/nfs/filelayout/filelayout.c @@ -375,8 +375,7 @@ static int filelayout_commit_done_cb(struct rpc_task *task, return -EAGAIN; } - if (data->verf.committed == NFS_UNSTABLE) - pnfs_set_layoutcommit(data->inode, data->lseg, data->lwb); + pnfs_set_layoutcommit(data->inode, data->lseg, data->lwb); return 0; } -- 2.7.4 ^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v4 09/28] pNFS/flexfiles: Fix layoutcommit after a commit to DS 2016-07-06 22:29 ` [PATCH v4 08/28] pNFS/files: Fix layoutcommit after a commit to DS Trond Myklebust @ 2016-07-06 22:29 ` Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 10/28] pNFS/flexfiles: Clean up calls to pnfs_set_layoutcommit() Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw) To: linux-nfs We should always do a layoutcommit after commit to DS, except if the layout segment we're using has set FF_FLAGS_NO_LAYOUTCOMMIT. Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/flexfilelayout/flexfilelayout.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c index 0e8018bc9880..2689c9e9dc3c 100644 --- a/fs/nfs/flexfilelayout/flexfilelayout.c +++ b/fs/nfs/flexfilelayout/flexfilelayout.c @@ -1530,8 +1530,7 @@ static int ff_layout_commit_done_cb(struct rpc_task *task, return -EAGAIN; } - if (data->verf.committed == NFS_UNSTABLE - && ff_layout_need_layoutcommit(data->lseg)) + if (ff_layout_need_layoutcommit(data->lseg)) pnfs_set_layoutcommit(data->inode, data->lseg, data->lwb); return 0; -- 2.7.4 ^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v4 10/28] pNFS/flexfiles: Clean up calls to pnfs_set_layoutcommit() 2016-07-06 22:29 ` [PATCH v4 09/28] pNFS/flexfiles: " Trond Myklebust @ 2016-07-06 22:29 ` Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 11/28] pNFS: Files and flexfiles always need to commit before layoutcommit Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw) To: linux-nfs Let's just have one place where we check ff_layout_need_layoutcommit(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/flexfilelayout/flexfilelayout.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c index 2689c9e9dc3c..14f2ed3f1a5b 100644 --- a/fs/nfs/flexfilelayout/flexfilelayout.c +++ b/fs/nfs/flexfilelayout/flexfilelayout.c @@ -1325,15 +1325,16 @@ ff_layout_need_layoutcommit(struct pnfs_layout_segment *lseg) * we always send layoutcommit after DS writes. */ static void -ff_layout_set_layoutcommit(struct nfs_pgio_header *hdr) +ff_layout_set_layoutcommit(struct inode *inode, + struct pnfs_layout_segment *lseg, + loff_t end_offset) { - if (!ff_layout_need_layoutcommit(hdr->lseg)) + if (!ff_layout_need_layoutcommit(lseg)) return; - pnfs_set_layoutcommit(hdr->inode, hdr->lseg, - hdr->mds_offset + hdr->res.count); - dprintk("%s inode %lu pls_end_pos %lu\n", __func__, hdr->inode->i_ino, - (unsigned long) NFS_I(hdr->inode)->layout->plh_lwb); + pnfs_set_layoutcommit(inode, lseg, end_offset); + dprintk("%s inode %lu pls_end_pos %llu\n", __func__, inode->i_ino, + (unsigned long long) NFS_I(inode)->layout->plh_lwb); } static bool @@ -1494,7 +1495,8 @@ static int ff_layout_write_done_cb(struct rpc_task *task, if (hdr->res.verf->committed == NFS_FILE_SYNC || hdr->res.verf->committed == NFS_DATA_SYNC) - ff_layout_set_layoutcommit(hdr); + ff_layout_set_layoutcommit(hdr->inode, hdr->lseg, + hdr->mds_offset + (loff_t)hdr->res.count); /* zero out fattr since we don't care DS attr at all */ hdr->fattr.valid = 0; @@ -1530,8 +1532,7 @@ static int ff_layout_commit_done_cb(struct rpc_task *task, return -EAGAIN; } - if (ff_layout_need_layoutcommit(data->lseg)) - pnfs_set_layoutcommit(data->inode, data->lseg, data->lwb); + ff_layout_set_layoutcommit(data->inode, data->lseg, data->lwb); return 0; } -- 2.7.4 ^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v4 11/28] pNFS: Files and flexfiles always need to commit before layoutcommit 2016-07-06 22:29 ` [PATCH v4 10/28] pNFS/flexfiles: Clean up calls to pnfs_set_layoutcommit() Trond Myklebust @ 2016-07-06 22:29 ` Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 12/28] pNFS: Ensure we layoutcommit before revalidating attributes Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw) To: linux-nfs So ensure that we mark the layout for commit once the write is done, and then ensure that the commit to ds is finished before sending layoutcommit. Note that by doing this, we're able to optimise away the commit for the case of servers that don't need layoutcommit in order to return updated attributes. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/filelayout/filelayout.c | 9 ++++++--- fs/nfs/flexfilelayout/flexfilelayout.c | 7 +++++-- fs/nfs/nfs4xdr.c | 11 ++++++++--- fs/nfs/pnfs.c | 5 ++++- fs/nfs/pnfs_nfs.c | 7 +++++++ 5 files changed, 30 insertions(+), 9 deletions(-) diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c index b4c1407e8fe4..25bd91a6e088 100644 --- a/fs/nfs/filelayout/filelayout.c +++ b/fs/nfs/filelayout/filelayout.c @@ -255,13 +255,16 @@ static int filelayout_read_done_cb(struct rpc_task *task, static void filelayout_set_layoutcommit(struct nfs_pgio_header *hdr) { + loff_t end_offs = 0; if (FILELAYOUT_LSEG(hdr->lseg)->commit_through_mds || - hdr->res.verf->committed != NFS_DATA_SYNC) + hdr->res.verf->committed == NFS_FILE_SYNC) return; + if (hdr->res.verf->committed == NFS_DATA_SYNC) + end_offs = hdr->mds_offset + (loff_t)hdr->res.count; - pnfs_set_layoutcommit(hdr->inode, hdr->lseg, - hdr->mds_offset + hdr->res.count); + /* Note: if the write is unstable, don't set end_offs until commit */ + pnfs_set_layoutcommit(hdr->inode, hdr->lseg, end_offs); dprintk("%s inode %lu pls_end_pos %lu\n", __func__, hdr->inode->i_ino, (unsigned long) NFS_I(hdr->inode)->layout->plh_lwb); } diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c index 14f2ed3f1a5b..e6206eaf2bdf 100644 --- a/fs/nfs/flexfilelayout/flexfilelayout.c +++ b/fs/nfs/flexfilelayout/flexfilelayout.c @@ -1470,6 +1470,7 @@ static void ff_layout_read_release(void *data) static int ff_layout_write_done_cb(struct rpc_task *task, struct nfs_pgio_header *hdr) { + loff_t end_offs = 0; int err; trace_nfs4_pnfs_write(hdr, task->tk_status); @@ -1495,8 +1496,10 @@ static int ff_layout_write_done_cb(struct rpc_task *task, if (hdr->res.verf->committed == NFS_FILE_SYNC || hdr->res.verf->committed == NFS_DATA_SYNC) - ff_layout_set_layoutcommit(hdr->inode, hdr->lseg, - hdr->mds_offset + (loff_t)hdr->res.count); + end_offs = hdr->mds_offset + (loff_t)hdr->res.count; + + /* Note: if the write is unstable, don't set end_offs until commit */ + ff_layout_set_layoutcommit(hdr->inode, hdr->lseg, end_offs); /* zero out fattr since we don't care DS attr at all */ hdr->fattr.valid = 0; diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c index 661e753fe1c9..7bd3a5c09d31 100644 --- a/fs/nfs/nfs4xdr.c +++ b/fs/nfs/nfs4xdr.c @@ -1985,9 +1985,14 @@ encode_layoutcommit(struct xdr_stream *xdr, p = xdr_encode_hyper(p, args->lastbytewritten + 1); /* length */ *p = cpu_to_be32(0); /* reclaim */ encode_nfs4_stateid(xdr, &args->stateid); - p = reserve_space(xdr, 20); - *p++ = cpu_to_be32(1); /* newoffset = TRUE */ - p = xdr_encode_hyper(p, args->lastbytewritten); + if (args->lastbytewritten != U64_MAX) { + p = reserve_space(xdr, 20); + *p++ = cpu_to_be32(1); /* newoffset = TRUE */ + p = xdr_encode_hyper(p, args->lastbytewritten); + } else { + p = reserve_space(xdr, 12); + *p++ = cpu_to_be32(0); /* newoffset = FALSE */ + } *p++ = cpu_to_be32(0); /* Never send time_modify_changed */ *p++ = cpu_to_be32(NFS_SERVER(args->inode)->pnfs_curr_ld->id);/* type */ diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c index 0c7e0d45a4de..62553182514e 100644 --- a/fs/nfs/pnfs.c +++ b/fs/nfs/pnfs.c @@ -2378,7 +2378,10 @@ pnfs_layoutcommit_inode(struct inode *inode, bool sync) nfs_fattr_init(&data->fattr); data->args.bitmask = NFS_SERVER(inode)->cache_consistency_bitmask; data->res.fattr = &data->fattr; - data->args.lastbytewritten = end_pos - 1; + if (end_pos != 0) + data->args.lastbytewritten = end_pos - 1; + else + data->args.lastbytewritten = U64_MAX; data->res.server = NFS_SERVER(inode); if (ld->prepare_layoutcommit) { diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c index 0dfc476da3e1..0d10cc280a23 100644 --- a/fs/nfs/pnfs_nfs.c +++ b/fs/nfs/pnfs_nfs.c @@ -932,6 +932,13 @@ EXPORT_SYMBOL_GPL(pnfs_layout_mark_request_commit); int pnfs_nfs_generic_sync(struct inode *inode, bool datasync) { + int ret; + + if (!pnfs_layoutcommit_outstanding(inode)) + return 0; + ret = nfs_commit_inode(inode, FLUSH_SYNC); + if (ret < 0) + return ret; if (datasync) return 0; return pnfs_layoutcommit_inode(inode, true); -- 2.7.4 ^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v4 12/28] pNFS: Ensure we layoutcommit before revalidating attributes 2016-07-06 22:29 ` [PATCH v4 11/28] pNFS: Files and flexfiles always need to commit before layoutcommit Trond Myklebust @ 2016-07-06 22:29 ` Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 13/28] pNFS: pnfs_layoutcommit_outstanding() is no longer used when !CONFIG_NFS_V4_1 Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw) To: linux-nfs If we need to update the cached attributes, then we'd better make sure that we also layoutcommit first. Otherwise, the server may have stale attributes. Prior to this patch, the revalidation code tried to "fix" this problem by simply disabling attributes that would be affected by the layoutcommit. That approach breaks nfs_writeback_check_extend(), leading to a file size corruption. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/inode.c | 23 +++++++---------------- 1 file changed, 7 insertions(+), 16 deletions(-) diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index 4e65a5a8a01b..6c0618eb5d57 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -974,6 +974,13 @@ __nfs_revalidate_inode(struct nfs_server *server, struct inode *inode) if (NFS_STALE(inode)) goto out; + /* pNFS: Attributes aren't updated until we layoutcommit */ + if (S_ISREG(inode->i_mode)) { + status = pnfs_sync_inode(inode, false); + if (status) + goto out; + } + status = -ENOMEM; fattr = nfs_alloc_fattr(); if (fattr == NULL) @@ -1493,28 +1500,12 @@ static int nfs_inode_attrs_need_update(const struct inode *inode, const struct n ((long)nfsi->attr_gencount - (long)nfs_read_attr_generation_counter() > 0); } -/* - * Don't trust the change_attribute, mtime, ctime or size if - * a pnfs LAYOUTCOMMIT is outstanding - */ -static void nfs_inode_attrs_handle_layoutcommit(struct inode *inode, - struct nfs_fattr *fattr) -{ - if (pnfs_layoutcommit_outstanding(inode)) - fattr->valid &= ~(NFS_ATTR_FATTR_CHANGE | - NFS_ATTR_FATTR_MTIME | - NFS_ATTR_FATTR_CTIME | - NFS_ATTR_FATTR_SIZE); -} - static int nfs_refresh_inode_locked(struct inode *inode, struct nfs_fattr *fattr) { int ret; trace_nfs_refresh_inode_enter(inode); - nfs_inode_attrs_handle_layoutcommit(inode, fattr); - if (nfs_inode_attrs_need_update(inode, fattr)) ret = nfs_update_inode(inode, fattr); else -- 2.7.4 ^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v4 13/28] pNFS: pnfs_layoutcommit_outstanding() is no longer used when !CONFIG_NFS_V4_1 2016-07-06 22:29 ` [PATCH v4 12/28] pNFS: Ensure we layoutcommit before revalidating attributes Trond Myklebust @ 2016-07-06 22:29 ` Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 14/28] NFS: Fix O_DIRECT verifier problems Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw) To: linux-nfs Cleanup... Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/pnfs.h | 7 ------- 1 file changed, 7 deletions(-) diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h index b21bd0bee784..d6be5299a55a 100644 --- a/fs/nfs/pnfs.h +++ b/fs/nfs/pnfs.h @@ -716,13 +716,6 @@ pnfs_use_threshold(struct nfs4_threshold **dst, struct nfs4_threshold *src, return false; } -static inline bool -pnfs_layoutcommit_outstanding(struct inode *inode) -{ - return false; -} - - static inline struct nfs4_threshold *pnfs_mdsthreshold_alloc(void) { return NULL; -- 2.7.4 ^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v4 14/28] NFS: Fix O_DIRECT verifier problems 2016-07-06 22:29 ` [PATCH v4 13/28] pNFS: pnfs_layoutcommit_outstanding() is no longer used when !CONFIG_NFS_V4_1 Trond Myklebust @ 2016-07-06 22:29 ` Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 15/28] NFS: Ensure we reset the write verifier 'committed' value on resend Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw) To: linux-nfs We should not be interested in looking at the value of the stable field, since that could take any value. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/direct.c | 10 ++++++++-- fs/nfs/internal.h | 7 +++++++ fs/nfs/write.c | 2 +- 3 files changed, 16 insertions(+), 3 deletions(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 979b3c4dee6a..d6d43b5eafb3 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -196,6 +196,12 @@ static void nfs_direct_set_hdr_verf(struct nfs_direct_req *dreq, WARN_ON_ONCE(verfp->committed < 0); } +static int nfs_direct_cmp_verf(const struct nfs_writeverf *v1, + const struct nfs_writeverf *v2) +{ + return nfs_write_verifier_cmp(&v1->verifier, &v2->verifier); +} + /* * nfs_direct_cmp_hdr_verf - compare verifier for pgio header * @dreq - direct request possibly spanning multiple servers @@ -215,7 +221,7 @@ static int nfs_direct_set_or_cmp_hdr_verf(struct nfs_direct_req *dreq, nfs_direct_set_hdr_verf(dreq, hdr); return 0; } - return memcmp(verfp, &hdr->verf, sizeof(struct nfs_writeverf)); + return nfs_direct_cmp_verf(verfp, &hdr->verf); } /* @@ -238,7 +244,7 @@ static int nfs_direct_cmp_commit_data_verf(struct nfs_direct_req *dreq, if (verfp->committed < 0) return 1; - return memcmp(verfp, &data->verf, sizeof(struct nfs_writeverf)); + return nfs_direct_cmp_verf(verfp, &data->verf); } /** diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index 5154fa65a2f2..150a8eb0f323 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -506,6 +506,13 @@ extern int nfs_migrate_page(struct address_space *, #define nfs_migrate_page NULL #endif +static inline int +nfs_write_verifier_cmp(const struct nfs_write_verifier *v1, + const struct nfs_write_verifier *v2) +{ + return memcmp(v1->data, v2->data, sizeof(v1->data)); +} + /* unlink.c */ extern struct rpc_task * nfs_async_rename(struct inode *old_dir, struct inode *new_dir, diff --git a/fs/nfs/write.c b/fs/nfs/write.c index b13d48881d3a..3087fb6f1983 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -1789,7 +1789,7 @@ static void nfs_commit_release_pages(struct nfs_commit_data *data) /* Okay, COMMIT succeeded, apparently. Check the verifier * returned by the server against all stored verfs. */ - if (!memcmp(&req->wb_verf, &data->verf.verifier, sizeof(req->wb_verf))) { + if (!nfs_write_verifier_cmp(&req->wb_verf, &data->verf.verifier)) { /* We have a match */ nfs_inode_remove_request(req); dprintk(" OK\n"); -- 2.7.4 ^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v4 15/28] NFS: Ensure we reset the write verifier 'committed' value on resend. 2016-07-06 22:29 ` [PATCH v4 14/28] NFS: Fix O_DIRECT verifier problems Trond Myklebust @ 2016-07-06 22:29 ` Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 16/28] NFS: Remove racy size manipulations in O_DIRECT Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw) To: linux-nfs Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/direct.c | 2 ++ fs/nfs/internal.h | 17 +++++++++++++++++ 2 files changed, 19 insertions(+) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index d6d43b5eafb3..fb659bb50678 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -661,6 +661,8 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq) nfs_direct_write_scan_commit_list(dreq->inode, &reqs, &cinfo); dreq->count = 0; + dreq->verf.committed = NFS_INVALID_STABLE_HOW; + nfs_clear_pnfs_ds_commit_verifiers(&dreq->ds_cinfo); for (i = 0; i < dreq->mirror_count; i++) dreq->mirrors[i].count = 0; get_dreq(dreq); diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index 150a8eb0f323..0eb5c924886d 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -499,6 +499,23 @@ int nfs_key_timeout_notify(struct file *filp, struct inode *inode); bool nfs_ctx_key_to_expire(struct nfs_open_context *ctx); void nfs_pageio_stop_mirroring(struct nfs_pageio_descriptor *pgio); +#ifdef CONFIG_NFS_V4_1 +static inline +void nfs_clear_pnfs_ds_commit_verifiers(struct pnfs_ds_commit_info *cinfo) +{ + int i; + + for (i = 0; i < cinfo->nbuckets; i++) + cinfo->buckets[i].direct_verf.committed = NFS_INVALID_STABLE_HOW; +} +#else +static inline +void nfs_clear_pnfs_ds_commit_verifiers(struct pnfs_ds_commit_info *cinfo) +{ +} +#endif + + #ifdef CONFIG_MIGRATION extern int nfs_migrate_page(struct address_space *, struct page *, struct page *, enum migrate_mode); -- 2.7.4 ^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v4 16/28] NFS: Remove racy size manipulations in O_DIRECT 2016-07-06 22:29 ` [PATCH v4 15/28] NFS: Ensure we reset the write verifier 'committed' value on resend Trond Myklebust @ 2016-07-06 22:29 ` Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 17/28] NFS Cleanup: move call to generic_write_checks() into fs/nfs/direct.c Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw) To: linux-nfs On success, the RPC callbacks will ensure that we make the appropriate calls to nfs_writeback_update_inode() Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/direct.c | 16 ---------------- 1 file changed, 16 deletions(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index fb659bb50678..826d4dace0e5 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -376,15 +376,6 @@ static void nfs_direct_complete(struct nfs_direct_req *dreq, bool write) { struct inode *inode = dreq->inode; - if (dreq->iocb && write) { - loff_t pos = dreq->iocb->ki_pos + dreq->count; - - spin_lock(&inode->i_lock); - if (i_size_read(inode) < pos) - i_size_write(inode, pos); - spin_unlock(&inode->i_lock); - } - if (write) nfs_zap_mapping(inode, inode->i_mapping); @@ -1058,14 +1049,7 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) if (!result) { result = nfs_direct_wait(dreq); if (result > 0) { - struct inode *inode = mapping->host; - iocb->ki_pos = pos + result; - spin_lock(&inode->i_lock); - if (i_size_read(inode) < iocb->ki_pos) - i_size_write(inode, iocb->ki_pos); - spin_unlock(&inode->i_lock); - /* XXX: should check the generic_write_sync retval */ generic_write_sync(iocb, result); } -- 2.7.4 ^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v4 17/28] NFS Cleanup: move call to generic_write_checks() into fs/nfs/direct.c 2016-07-06 22:29 ` [PATCH v4 16/28] NFS: Remove racy size manipulations in O_DIRECT Trond Myklebust @ 2016-07-06 22:29 ` Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 18/28] NFS: Move buffered I/O locking into nfs_file_write() Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw) To: linux-nfs Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/direct.c | 12 ++++++++---- fs/nfs/file.c | 6 +----- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 826d4dace0e5..0169eca8eb42 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -988,6 +988,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) { ssize_t result = -EINVAL; + size_t count; struct file *file = iocb->ki_filp; struct address_space *mapping = file->f_mapping; struct inode *inode = mapping->host; @@ -998,8 +999,11 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) dfprintk(FILE, "NFS: direct write(%pD2, %zd@%Ld)\n", file, iov_iter_count(iter), (long long) iocb->ki_pos); - nfs_add_stats(mapping->host, NFSIOS_DIRECTWRITTENBYTES, - iov_iter_count(iter)); + result = generic_write_checks(iocb, iter); + if (result <= 0) + return result; + count = result; + nfs_add_stats(mapping->host, NFSIOS_DIRECTWRITTENBYTES, count); pos = iocb->ki_pos; end = (pos + iov_iter_count(iter) - 1) >> PAGE_SHIFT; @@ -1017,7 +1021,7 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) goto out_unlock; } - task_io_account_write(iov_iter_count(iter)); + task_io_account_write(count); result = -ENOMEM; dreq = nfs_direct_req_alloc(); @@ -1025,7 +1029,7 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) goto out_unlock; dreq->inode = inode; - dreq->bytes_left = dreq->max_count = iov_iter_count(iter); + dreq->bytes_left = dreq->max_count = count; dreq->io_start = pos; dreq->ctx = get_nfs_open_context(nfs_file_open_context(iocb->ki_filp)); l_ctx = nfs_get_lock_context(dreq->ctx); diff --git a/fs/nfs/file.c b/fs/nfs/file.c index df4dd8e7e62e..c26847c84d00 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -629,12 +629,8 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from) if (result) return result; - if (iocb->ki_flags & IOCB_DIRECT) { - result = generic_write_checks(iocb, from); - if (result <= 0) - return result; + if (iocb->ki_flags & IOCB_DIRECT) return nfs_file_direct_write(iocb, from); - } dprintk("NFS: write(%pD2, %zu@%Ld)\n", file, count, (long long) iocb->ki_pos); -- 2.7.4 ^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v4 18/28] NFS: Move buffered I/O locking into nfs_file_write() 2016-07-06 22:29 ` [PATCH v4 17/28] NFS Cleanup: move call to generic_write_checks() into fs/nfs/direct.c Trond Myklebust @ 2016-07-06 22:29 ` Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 19/28] NFS: Do not serialise O_DIRECT reads and writes Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw) To: linux-nfs Preparation for the patch that de-serialises O_DIRECT reads and writes. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/file.c | 27 +++++++++++++++------------ 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index c26847c84d00..46cf0afe3c0f 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -623,7 +623,6 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from) struct inode *inode = file_inode(file); unsigned long written = 0; ssize_t result; - size_t count = iov_iter_count(from); result = nfs_key_timeout_notify(file, inode); if (result) @@ -633,9 +632,8 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from) return nfs_file_direct_write(iocb, from); dprintk("NFS: write(%pD2, %zu@%Ld)\n", - file, count, (long long) iocb->ki_pos); + file, iov_iter_count(from), (long long) iocb->ki_pos); - result = -EBUSY; if (IS_SWAPFILE(inode)) goto out_swapfile; /* @@ -647,28 +645,33 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from) goto out; } - result = count; - if (!count) + inode_lock(inode); + result = generic_write_checks(iocb, from); + if (result > 0) { + current->backing_dev_info = inode_to_bdi(inode); + result = generic_perform_write(file, from, iocb->ki_pos); + current->backing_dev_info = NULL; + } + inode_unlock(inode); + if (result <= 0) goto out; - result = generic_file_write_iter(iocb, from); - if (result > 0) - written = result; + written = generic_write_sync(iocb, result); + iocb->ki_pos += written; /* Return error values */ - if (result >= 0 && nfs_need_check_write(file, inode)) { + if (nfs_need_check_write(file, inode)) { int err = vfs_fsync(file, 0); if (err < 0) result = err; } - if (result > 0) - nfs_add_stats(inode, NFSIOS_NORMALWRITTENBYTES, written); + nfs_add_stats(inode, NFSIOS_NORMALWRITTENBYTES, written); out: return result; out_swapfile: printk(KERN_INFO "NFS: attempt to write to active swap file!\n"); - goto out; + return -EBUSY; } EXPORT_SYMBOL_GPL(nfs_file_write); -- 2.7.4 ^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v4 19/28] NFS: Do not serialise O_DIRECT reads and writes 2016-07-06 22:29 ` [PATCH v4 18/28] NFS: Move buffered I/O locking into nfs_file_write() Trond Myklebust @ 2016-07-06 22:29 ` Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 20/28] NFS: Cleanup nfs_direct_complete() Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw) To: linux-nfs Allow dio requests to be scheduled in parallel, but ensuring that they do not conflict with buffered I/O. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/Makefile | 2 +- fs/nfs/direct.c | 41 +++----------- fs/nfs/file.c | 12 ++-- fs/nfs/internal.h | 8 +++ fs/nfs/io.c | 147 +++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/nfs_fs.h | 1 + 6 files changed, 174 insertions(+), 37 deletions(-) create mode 100644 fs/nfs/io.c diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile index 8664417955a2..6abdda209642 100644 --- a/fs/nfs/Makefile +++ b/fs/nfs/Makefile @@ -6,7 +6,7 @@ obj-$(CONFIG_NFS_FS) += nfs.o CFLAGS_nfstrace.o += -I$(src) nfs-y := client.o dir.o file.o getroot.o inode.o super.o \ - direct.o pagelist.o read.o symlink.o unlink.o \ + io.o direct.o pagelist.o read.o symlink.o unlink.o \ write.o namespace.o mount_clnt.o nfstrace.o nfs-$(CONFIG_ROOT_NFS) += nfsroot.o nfs-$(CONFIG_SYSCTL) += sysctl.o diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 0169eca8eb42..6d0e88096440 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -578,17 +578,12 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter) if (!count) goto out; - inode_lock(inode); - result = nfs_sync_mapping(mapping); - if (result) - goto out_unlock; - task_io_account_read(count); result = -ENOMEM; dreq = nfs_direct_req_alloc(); if (dreq == NULL) - goto out_unlock; + goto out; dreq->inode = inode; dreq->bytes_left = dreq->max_count = count; @@ -603,10 +598,12 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter) if (!is_sync_kiocb(iocb)) dreq->iocb = iocb; + nfs_start_io_direct(inode); + NFS_I(inode)->read_io += count; result = nfs_direct_read_schedule_iovec(dreq, iter, iocb->ki_pos); - inode_unlock(inode); + nfs_end_io_direct(inode); if (!result) { result = nfs_direct_wait(dreq); @@ -614,13 +611,8 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter) iocb->ki_pos += result; } - nfs_direct_req_release(dreq); - return result; - out_release: nfs_direct_req_release(dreq); -out_unlock: - inode_unlock(inode); out: return result; } @@ -1008,25 +1000,12 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) pos = iocb->ki_pos; end = (pos + iov_iter_count(iter) - 1) >> PAGE_SHIFT; - inode_lock(inode); - - result = nfs_sync_mapping(mapping); - if (result) - goto out_unlock; - - if (mapping->nrpages) { - result = invalidate_inode_pages2_range(mapping, - pos >> PAGE_SHIFT, end); - if (result) - goto out_unlock; - } - task_io_account_write(count); result = -ENOMEM; dreq = nfs_direct_req_alloc(); if (!dreq) - goto out_unlock; + goto out; dreq->inode = inode; dreq->bytes_left = dreq->max_count = count; @@ -1041,6 +1020,8 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) if (!is_sync_kiocb(iocb)) dreq->iocb = iocb; + nfs_start_io_direct(inode); + result = nfs_direct_write_schedule_iovec(dreq, iter, pos); if (mapping->nrpages) { @@ -1048,7 +1029,7 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) pos >> PAGE_SHIFT, end); } - inode_unlock(inode); + nfs_end_io_direct(inode); if (!result) { result = nfs_direct_wait(dreq); @@ -1058,13 +1039,9 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) generic_write_sync(iocb, result); } } - nfs_direct_req_release(dreq); - return result; - out_release: nfs_direct_req_release(dreq); -out_unlock: - inode_unlock(inode); +out: return result; } diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 46cf0afe3c0f..9f8da9e1b23f 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -170,12 +170,14 @@ nfs_file_read(struct kiocb *iocb, struct iov_iter *to) iocb->ki_filp, iov_iter_count(to), (unsigned long) iocb->ki_pos); - result = nfs_revalidate_mapping_protected(inode, iocb->ki_filp->f_mapping); + nfs_start_io_read(inode); + result = nfs_revalidate_mapping(inode, iocb->ki_filp->f_mapping); if (!result) { result = generic_file_read_iter(iocb, to); if (result > 0) nfs_add_stats(inode, NFSIOS_NORMALREADBYTES, result); } + nfs_end_io_read(inode); return result; } EXPORT_SYMBOL_GPL(nfs_file_read); @@ -191,12 +193,14 @@ nfs_file_splice_read(struct file *filp, loff_t *ppos, dprintk("NFS: splice_read(%pD2, %lu@%Lu)\n", filp, (unsigned long) count, (unsigned long long) *ppos); - res = nfs_revalidate_mapping_protected(inode, filp->f_mapping); + nfs_start_io_read(inode); + res = nfs_revalidate_mapping(inode, filp->f_mapping); if (!res) { res = generic_file_splice_read(filp, ppos, pipe, count, flags); if (res > 0) nfs_add_stats(inode, NFSIOS_NORMALREADBYTES, res); } + nfs_end_io_read(inode); return res; } EXPORT_SYMBOL_GPL(nfs_file_splice_read); @@ -645,14 +649,14 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from) goto out; } - inode_lock(inode); + nfs_start_io_write(inode); result = generic_write_checks(iocb, from); if (result > 0) { current->backing_dev_info = inode_to_bdi(inode); result = generic_perform_write(file, from, iocb->ki_pos); current->backing_dev_info = NULL; } - inode_unlock(inode); + nfs_end_io_write(inode); if (result <= 0) goto out; diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index 0eb5c924886d..159b64ede82a 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -411,6 +411,14 @@ extern void __exit unregister_nfs_fs(void); extern bool nfs_sb_active(struct super_block *sb); extern void nfs_sb_deactive(struct super_block *sb); +/* io.c */ +extern void nfs_start_io_read(struct inode *inode); +extern void nfs_end_io_read(struct inode *inode); +extern void nfs_start_io_write(struct inode *inode); +extern void nfs_end_io_write(struct inode *inode); +extern void nfs_start_io_direct(struct inode *inode); +extern void nfs_end_io_direct(struct inode *inode); + /* namespace.c */ #define NFS_PATH_CANONICAL 1 extern char *nfs_path(char **p, struct dentry *dentry, diff --git a/fs/nfs/io.c b/fs/nfs/io.c new file mode 100644 index 000000000000..1fc5d1ce327e --- /dev/null +++ b/fs/nfs/io.c @@ -0,0 +1,147 @@ +/* + * Copyright (c) 2016 Trond Myklebust + * + * I/O and data path helper functionality. + */ + +#include <linux/types.h> +#include <linux/kernel.h> +#include <linux/bitops.h> +#include <linux/rwsem.h> +#include <linux/fs.h> +#include <linux/nfs_fs.h> + +#include "internal.h" + +/* Call with exclusively locked inode->i_rwsem */ +static void nfs_block_o_direct(struct nfs_inode *nfsi, struct inode *inode) +{ + if (test_bit(NFS_INO_ODIRECT, &nfsi->flags)) { + clear_bit(NFS_INO_ODIRECT, &nfsi->flags); + inode_dio_wait(inode); + } +} + +/** + * nfs_start_io_read - declare the file is being used for buffered reads + * @inode - file inode + * + * Declare that a buffered read operation is about to start, and ensure + * that we block all direct I/O. + * On exit, the function ensures that the NFS_INO_ODIRECT flag is unset, + * and holds a shared lock on inode->i_rwsem to ensure that the flag + * cannot be changed. + * In practice, this means that buffered read operations are allowed to + * execute in parallel, thanks to the shared lock, whereas direct I/O + * operations need to wait to grab an exclusive lock in order to set + * NFS_INO_ODIRECT. + * Note that buffered writes and truncates both take a write lock on + * inode->i_rwsem, meaning that those are serialised w.r.t. the reads. + */ +void +nfs_start_io_read(struct inode *inode) +{ + struct nfs_inode *nfsi = NFS_I(inode); + /* Be an optimist! */ + down_read(&inode->i_rwsem); + if (test_bit(NFS_INO_ODIRECT, &nfsi->flags) == 0) + return; + up_read(&inode->i_rwsem); + /* Slow path.... */ + down_write(&inode->i_rwsem); + nfs_block_o_direct(nfsi, inode); + downgrade_write(&inode->i_rwsem); +} + +/** + * nfs_end_io_read - declare that the buffered read operation is done + * @inode - file inode + * + * Declare that a buffered read operation is done, and release the shared + * lock on inode->i_rwsem. + */ +void +nfs_end_io_read(struct inode *inode) +{ + up_read(&inode->i_rwsem); +} + +/** + * nfs_start_io_write - declare the file is being used for buffered writes + * @inode - file inode + * + * Declare that a buffered read operation is about to start, and ensure + * that we block all direct I/O. + */ +void +nfs_start_io_write(struct inode *inode) +{ + down_write(&inode->i_rwsem); + nfs_block_o_direct(NFS_I(inode), inode); +} + +/** + * nfs_end_io_write - declare that the buffered write operation is done + * @inode - file inode + * + * Declare that a buffered write operation is done, and release the + * lock on inode->i_rwsem. + */ +void +nfs_end_io_write(struct inode *inode) +{ + up_write(&inode->i_rwsem); +} + +/* Call with exclusively locked inode->i_rwsem */ +static void nfs_block_buffered(struct nfs_inode *nfsi, struct inode *inode) +{ + if (!test_bit(NFS_INO_ODIRECT, &nfsi->flags)) { + set_bit(NFS_INO_ODIRECT, &nfsi->flags); + nfs_wb_all(inode); + } +} + +/** + * nfs_end_io_direct - declare the file is being used for direct i/o + * @inode - file inode + * + * Declare that a direct I/O operation is about to start, and ensure + * that we block all buffered I/O. + * On exit, the function ensures that the NFS_INO_ODIRECT flag is set, + * and holds a shared lock on inode->i_rwsem to ensure that the flag + * cannot be changed. + * In practice, this means that direct I/O operations are allowed to + * execute in parallel, thanks to the shared lock, whereas buffered I/O + * operations need to wait to grab an exclusive lock in order to clear + * NFS_INO_ODIRECT. + * Note that buffered writes and truncates both take a write lock on + * inode->i_rwsem, meaning that those are serialised w.r.t. O_DIRECT. + */ +void +nfs_start_io_direct(struct inode *inode) +{ + struct nfs_inode *nfsi = NFS_I(inode); + /* Be an optimist! */ + down_read(&inode->i_rwsem); + if (test_bit(NFS_INO_ODIRECT, &nfsi->flags) != 0) + return; + up_read(&inode->i_rwsem); + /* Slow path.... */ + down_write(&inode->i_rwsem); + nfs_block_buffered(nfsi, inode); + downgrade_write(&inode->i_rwsem); +} + +/** + * nfs_end_io_direct - declare that the direct i/o operation is done + * @inode - file inode + * + * Declare that a direct I/O operation is done, and release the shared + * lock on inode->i_rwsem. + */ +void +nfs_end_io_direct(struct inode *inode) +{ + up_read(&inode->i_rwsem); +} diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 120dd04b553c..225d17d35277 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -210,6 +210,7 @@ struct nfs_inode { #define NFS_INO_LAYOUTCOMMIT (9) /* layoutcommit required */ #define NFS_INO_LAYOUTCOMMITTING (10) /* layoutcommit inflight */ #define NFS_INO_LAYOUTSTATS (11) /* layoutstats inflight */ +#define NFS_INO_ODIRECT (12) /* I/O setting is O_DIRECT */ static inline struct nfs_inode *NFS_I(const struct inode *inode) { -- 2.7.4 ^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v4 20/28] NFS: Cleanup nfs_direct_complete() 2016-07-06 22:29 ` [PATCH v4 19/28] NFS: Do not serialise O_DIRECT reads and writes Trond Myklebust @ 2016-07-06 22:29 ` Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 21/28] NFS: Remove redundant waits for O_DIRECT in fsync() and write_begin() Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw) To: linux-nfs There is only one caller that sets the "write" argument to true, so just move the call to nfs_zap_mapping() and get rid of the now redundant argument. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/direct.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 6d0e88096440..c16d33eb1ddf 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -372,13 +372,10 @@ out: * Synchronous I/O uses a stack-allocated iocb. Thus we can't trust * the iocb is still valid here if this is a synchronous request. */ -static void nfs_direct_complete(struct nfs_direct_req *dreq, bool write) +static void nfs_direct_complete(struct nfs_direct_req *dreq) { struct inode *inode = dreq->inode; - if (write) - nfs_zap_mapping(inode, inode->i_mapping); - inode_dio_end(inode); if (dreq->iocb) { @@ -431,7 +428,7 @@ static void nfs_direct_read_completion(struct nfs_pgio_header *hdr) } out_put: if (put_dreq(dreq)) - nfs_direct_complete(dreq, false); + nfs_direct_complete(dreq); hdr->release(hdr); } @@ -537,7 +534,7 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq, } if (put_dreq(dreq)) - nfs_direct_complete(dreq, false); + nfs_direct_complete(dreq); return 0; } @@ -764,7 +761,8 @@ static void nfs_direct_write_schedule_work(struct work_struct *work) nfs_direct_write_reschedule(dreq); break; default: - nfs_direct_complete(dreq, true); + nfs_zap_mapping(dreq->inode, dreq->inode->i_mapping); + nfs_direct_complete(dreq); } } -- 2.7.4 ^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v4 21/28] NFS: Remove redundant waits for O_DIRECT in fsync() and write_begin() 2016-07-06 22:29 ` [PATCH v4 20/28] NFS: Cleanup nfs_direct_complete() Trond Myklebust @ 2016-07-06 22:29 ` Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 22/28] NFS: Remove unused function nfs_revalidate_mapping_protected() Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw) To: linux-nfs We're now waiting immediately after taking the locks, so waiting in fsync() and write_begin() is either redundant or potentially subject to livelock (if not holding the lock). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/file.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 9f8da9e1b23f..0e9b4a068f13 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -276,7 +276,6 @@ nfs_file_fsync(struct file *file, loff_t start, loff_t end, int datasync) trace_nfs_fsync_enter(inode); - inode_dio_wait(inode); do { ret = filemap_write_and_wait_range(inode->i_mapping, start, end); if (ret != 0) @@ -361,11 +360,6 @@ static int nfs_write_begin(struct file *file, struct address_space *mapping, file, mapping->host->i_ino, len, (long long) pos); start: - /* - * Wait for O_DIRECT to complete - */ - inode_dio_wait(mapping->host); - page = grab_cache_page_write_begin(mapping, index, flags); if (!page) return -ENOMEM; -- 2.7.4 ^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v4 22/28] NFS: Remove unused function nfs_revalidate_mapping_protected() 2016-07-06 22:29 ` [PATCH v4 21/28] NFS: Remove redundant waits for O_DIRECT in fsync() and write_begin() Trond Myklebust @ 2016-07-06 22:29 ` Trond Myklebust 2016-07-06 22:30 ` [PATCH v4 23/28] NFS: Do not aggressively cache file attributes in the case of O_DIRECT Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-06 22:29 UTC (permalink / raw) To: linux-nfs Clean up... Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/inode.c | 38 ++++---------------------------------- include/linux/nfs_fs.h | 1 - 2 files changed, 4 insertions(+), 35 deletions(-) diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index 6c0618eb5d57..0e0500f2bb6b 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -1131,14 +1131,12 @@ out: } /** - * __nfs_revalidate_mapping - Revalidate the pagecache + * nfs_revalidate_mapping - Revalidate the pagecache * @inode - pointer to host inode * @mapping - pointer to mapping - * @may_lock - take inode->i_mutex? */ -static int __nfs_revalidate_mapping(struct inode *inode, - struct address_space *mapping, - bool may_lock) +int nfs_revalidate_mapping(struct inode *inode, + struct address_space *mapping) { struct nfs_inode *nfsi = NFS_I(inode); unsigned long *bitlock = &nfsi->flags; @@ -1187,12 +1185,7 @@ static int __nfs_revalidate_mapping(struct inode *inode, nfsi->cache_validity &= ~NFS_INO_INVALID_DATA; spin_unlock(&inode->i_lock); trace_nfs_invalidate_mapping_enter(inode); - if (may_lock) { - inode_lock(inode); - ret = nfs_invalidate_mapping(inode, mapping); - inode_unlock(inode); - } else - ret = nfs_invalidate_mapping(inode, mapping); + ret = nfs_invalidate_mapping(inode, mapping); trace_nfs_invalidate_mapping_exit(inode, ret); clear_bit_unlock(NFS_INO_INVALIDATING, bitlock); @@ -1202,29 +1195,6 @@ out: return ret; } -/** - * nfs_revalidate_mapping - Revalidate the pagecache - * @inode - pointer to host inode - * @mapping - pointer to mapping - */ -int nfs_revalidate_mapping(struct inode *inode, struct address_space *mapping) -{ - return __nfs_revalidate_mapping(inode, mapping, false); -} - -/** - * nfs_revalidate_mapping_protected - Revalidate the pagecache - * @inode - pointer to host inode - * @mapping - pointer to mapping - * - * Differs from nfs_revalidate_mapping() in that it grabs the inode->i_mutex - * while invalidating the mapping. - */ -int nfs_revalidate_mapping_protected(struct inode *inode, struct address_space *mapping) -{ - return __nfs_revalidate_mapping(inode, mapping, true); -} - static bool nfs_file_has_writers(struct nfs_inode *nfsi) { struct inode *inode = &nfsi->vfs_inode; diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 225d17d35277..810124b33327 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -351,7 +351,6 @@ extern int nfs_revalidate_inode_rcu(struct nfs_server *server, struct inode *ino extern int __nfs_revalidate_inode(struct nfs_server *, struct inode *); extern int nfs_revalidate_mapping(struct inode *inode, struct address_space *mapping); extern int nfs_revalidate_mapping_rcu(struct inode *inode); -extern int nfs_revalidate_mapping_protected(struct inode *inode, struct address_space *mapping); extern int nfs_setattr(struct dentry *, struct iattr *); extern void nfs_setattr_update_inode(struct inode *inode, struct iattr *attr, struct nfs_fattr *); extern void nfs_setsecurity(struct inode *inode, struct nfs_fattr *fattr, -- 2.7.4 ^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v4 23/28] NFS: Do not aggressively cache file attributes in the case of O_DIRECT 2016-07-06 22:29 ` [PATCH v4 22/28] NFS: Remove unused function nfs_revalidate_mapping_protected() Trond Myklebust @ 2016-07-06 22:30 ` Trond Myklebust 2016-07-06 22:30 ` [PATCH v4 24/28] NFS: Getattr doesn't require data sync semantics Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-06 22:30 UTC (permalink / raw) To: linux-nfs A file that is open for O_DIRECT is by definition not obeying close-to-open cache consistency semantics, so let's not cache the attributes too aggressively either. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/inode.c | 9 +++++++-- fs/nfs/internal.h | 5 +++++ 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index 0e0500f2bb6b..7688436b19ba 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -1214,6 +1214,11 @@ static bool nfs_file_has_writers(struct nfs_inode *nfsi) list)->mode & FMODE_WRITE) == FMODE_WRITE; } +static bool nfs_file_has_buffered_writers(struct nfs_inode *nfsi) +{ + return nfs_file_has_writers(nfsi) && nfs_file_io_is_buffered(nfsi); +} + static unsigned long nfs_wcc_update_inode(struct inode *inode, struct nfs_fattr *fattr) { struct nfs_inode *nfsi = NFS_I(inode); @@ -1278,7 +1283,7 @@ static int nfs_check_inode_attributes(struct inode *inode, struct nfs_fattr *fat if ((fattr->valid & NFS_ATTR_FATTR_TYPE) && (inode->i_mode & S_IFMT) != (fattr->mode & S_IFMT)) return -EIO; - if (!nfs_file_has_writers(nfsi)) { + if (!nfs_file_has_buffered_writers(nfsi)) { /* Verify a few of the more important attributes */ if ((fattr->valid & NFS_ATTR_FATTR_CHANGE) != 0 && inode->i_version != fattr->change_attr) invalid |= NFS_INO_INVALID_ATTR | NFS_INO_REVAL_PAGECACHE; @@ -1660,7 +1665,7 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr) unsigned long invalid = 0; unsigned long now = jiffies; unsigned long save_cache_validity; - bool have_writers = nfs_file_has_writers(nfsi); + bool have_writers = nfs_file_has_buffered_writers(nfsi); bool cache_revalidated = true; dfprintk(VFS, "NFS: %s(%s/%lu fh_crc=0x%08x ct=%d info=0x%x)\n", diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index 159b64ede82a..01dccf18da0a 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -419,6 +419,11 @@ extern void nfs_end_io_write(struct inode *inode); extern void nfs_start_io_direct(struct inode *inode); extern void nfs_end_io_direct(struct inode *inode); +static inline bool nfs_file_io_is_buffered(struct nfs_inode *nfsi) +{ + return test_bit(NFS_INO_ODIRECT, &nfsi->flags) == 0; +} + /* namespace.c */ #define NFS_PATH_CANONICAL 1 extern char *nfs_path(char **p, struct dentry *dentry, -- 2.7.4 ^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v4 24/28] NFS: Getattr doesn't require data sync semantics 2016-07-06 22:30 ` [PATCH v4 23/28] NFS: Do not aggressively cache file attributes in the case of O_DIRECT Trond Myklebust @ 2016-07-06 22:30 ` Trond Myklebust 2016-07-06 22:30 ` [PATCH v4 25/28] NFSv4.2: Fix a race in nfs42_proc_deallocate() Trond Myklebust 2016-07-18 3:48 ` [PATCH v4 24/28] NFS: Getattr doesn't require data sync semantics Christoph Hellwig 0 siblings, 2 replies; 69+ messages in thread From: Trond Myklebust @ 2016-07-06 22:30 UTC (permalink / raw) To: linux-nfs When retrieving stat() information, NFS unfortunately does require us to sync writes to disk in order to ensure that mtime and ctime are up to date. However we shouldn't have to ensure that those writes are persisted. Relaxing that requirement does mean that we may see an mtime/ctime change if the server reboots and forces us to replay all writes. The exception to this rule are pNFS clients that are required to send layoutcommit, however that is dealt with by the call to pnfs_sync_inode() in _nfs_revalidate_inode(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/inode.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index 7688436b19ba..35fda08dc4f6 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -661,9 +661,7 @@ int nfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) trace_nfs_getattr_enter(inode); /* Flush out writes to the server in order to update c/mtime. */ if (S_ISREG(inode->i_mode)) { - inode_lock(inode); - err = nfs_sync_inode(inode); - inode_unlock(inode); + err = filemap_write_and_wait(inode->i_mapping); if (err) goto out; } -- 2.7.4 ^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v4 25/28] NFSv4.2: Fix a race in nfs42_proc_deallocate() 2016-07-06 22:30 ` [PATCH v4 24/28] NFS: Getattr doesn't require data sync semantics Trond Myklebust @ 2016-07-06 22:30 ` Trond Myklebust 2016-07-06 22:30 ` [PATCH v4 26/28] NFSv4.2: Fix writeback races in nfs4_copy_file_range Trond Myklebust 2016-07-18 3:48 ` [PATCH v4 24/28] NFS: Getattr doesn't require data sync semantics Christoph Hellwig 1 sibling, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-06 22:30 UTC (permalink / raw) To: linux-nfs When punching holes in a file, we want to ensure the operation is serialised w.r.t. other writes, meaning that we want to call nfs_sync_inode() while holding the inode lock. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/nfs42proc.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c index aa03ed09ba06..0f9f536e647b 100644 --- a/fs/nfs/nfs42proc.c +++ b/fs/nfs/nfs42proc.c @@ -113,15 +113,17 @@ int nfs42_proc_deallocate(struct file *filep, loff_t offset, loff_t len) if (!nfs_server_capable(inode, NFS_CAP_DEALLOCATE)) return -EOPNOTSUPP; - nfs_wb_all(inode); inode_lock(inode); + err = nfs_sync_inode(inode); + if (err) + goto out_unlock; err = nfs42_proc_fallocate(&msg, filep, offset, len); if (err == 0) truncate_pagecache_range(inode, offset, (offset + len) -1); if (err == -EOPNOTSUPP) NFS_SERVER(inode)->caps &= ~NFS_CAP_DEALLOCATE; - +out_unlock: inode_unlock(inode); return err; } -- 2.7.4 ^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v4 26/28] NFSv4.2: Fix writeback races in nfs4_copy_file_range 2016-07-06 22:30 ` [PATCH v4 25/28] NFSv4.2: Fix a race in nfs42_proc_deallocate() Trond Myklebust @ 2016-07-06 22:30 ` Trond Myklebust 2016-07-06 22:30 ` [PATCH v4 27/28] NFSv4.2: llseek(SEEK_HOLE) and llseek(SEEK_DATA) don't require data sync Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-06 22:30 UTC (permalink / raw) To: linux-nfs We need to ensure that any writes to the destination file are serialised with the copy, meaning that the writeback has to occur under the inode lock. Also relax the writeback requirement on the source, and rely on the stateid checking to tell us if the source rebooted. Add the helper nfs_filemap_write_and_wait_range() to call pnfs_sync_inode() as is appropriate for pNFS servers that may need a layoutcommit. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/internal.h | 3 +++ fs/nfs/nfs42proc.c | 9 +++++++++ fs/nfs/nfs4file.c | 14 +------------- fs/nfs/write.c | 18 ++++++++++++++++++ 4 files changed, 31 insertions(+), 13 deletions(-) diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index 01dccf18da0a..3b01c9146e15 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -512,6 +512,9 @@ int nfs_key_timeout_notify(struct file *filp, struct inode *inode); bool nfs_ctx_key_to_expire(struct nfs_open_context *ctx); void nfs_pageio_stop_mirroring(struct nfs_pageio_descriptor *pgio); +int nfs_filemap_write_and_wait_range(struct address_space *mapping, + loff_t lstart, loff_t lend); + #ifdef CONFIG_NFS_V4_1 static inline void nfs_clear_pnfs_ds_commit_verifiers(struct pnfs_ds_commit_info *cinfo) diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c index 0f9f536e647b..b7d457cea03f 100644 --- a/fs/nfs/nfs42proc.c +++ b/fs/nfs/nfs42proc.c @@ -156,11 +156,20 @@ static ssize_t _nfs42_proc_copy(struct file *src, loff_t pos_src, if (status) return status; + status = nfs_filemap_write_and_wait_range(file_inode(src)->i_mapping, + pos_src, pos_src + (loff_t)count - 1); + if (status) + return status; + status = nfs4_set_rw_stateid(&args.dst_stateid, dst_lock->open_context, dst_lock, FMODE_WRITE); if (status) return status; + status = nfs_sync_inode(dst_inode); + if (status) + return status; + status = nfs4_call_sync(server->client, server, &msg, &args.seq_args, &res.seq_res, 0); if (status == -ENOTSUPP) diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c index 014b0e41ace5..7cdc0ab9e6f5 100644 --- a/fs/nfs/nfs4file.c +++ b/fs/nfs/nfs4file.c @@ -133,21 +133,9 @@ static ssize_t nfs4_copy_file_range(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, size_t count, unsigned int flags) { - struct inode *in_inode = file_inode(file_in); - struct inode *out_inode = file_inode(file_out); - int ret; - - if (in_inode == out_inode) + if (file_inode(file_in) == file_inode(file_out)) return -EINVAL; - /* flush any pending writes */ - ret = nfs_sync_inode(in_inode); - if (ret) - return ret; - ret = nfs_sync_inode(out_inode); - if (ret) - return ret; - return nfs42_proc_copy(file_in, pos_in, file_out, pos_out, count); } diff --git a/fs/nfs/write.c b/fs/nfs/write.c index 3087fb6f1983..538a473b324b 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -1913,6 +1913,24 @@ out_mark_dirty: EXPORT_SYMBOL_GPL(nfs_write_inode); /* + * Wrapper for filemap_write_and_wait_range() + * + * Needed for pNFS in order to ensure data becomes visible to the + * client. + */ +int nfs_filemap_write_and_wait_range(struct address_space *mapping, + loff_t lstart, loff_t lend) +{ + int ret; + + ret = filemap_write_and_wait_range(mapping, lstart, lend); + if (ret == 0) + ret = pnfs_sync_inode(mapping->host, true); + return ret; +} +EXPORT_SYMBOL_GPL(nfs_filemap_write_and_wait_range); + +/* * flush the inode to disk. */ int nfs_wb_all(struct inode *inode) -- 2.7.4 ^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v4 27/28] NFSv4.2: llseek(SEEK_HOLE) and llseek(SEEK_DATA) don't require data sync 2016-07-06 22:30 ` [PATCH v4 26/28] NFSv4.2: Fix writeback races in nfs4_copy_file_range Trond Myklebust @ 2016-07-06 22:30 ` Trond Myklebust 2016-07-06 22:30 ` [PATCH v4 28/28] NFS nfs_vm_page_mkwrite: Don't freeze me, Bro Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-06 22:30 UTC (permalink / raw) To: linux-nfs We want to ensure that we write the cached data to the server, but don't require it be synced to disk. If the server reboots, we will get a stateid error, which will cause us to retry anyway. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/nfs42proc.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c index b7d457cea03f..616dc254b38b 100644 --- a/fs/nfs/nfs42proc.c +++ b/fs/nfs/nfs42proc.c @@ -269,7 +269,11 @@ static loff_t _nfs42_proc_llseek(struct file *filep, if (status) return status; - nfs_wb_all(inode); + status = nfs_filemap_write_and_wait_range(inode->i_mapping, + offset, LLONG_MAX); + if (status) + return status; + status = nfs4_call_sync(server->client, server, &msg, &args.seq_args, &res.seq_res, 0); if (status == -ENOTSUPP) -- 2.7.4 ^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH v4 28/28] NFS nfs_vm_page_mkwrite: Don't freeze me, Bro... 2016-07-06 22:30 ` [PATCH v4 27/28] NFSv4.2: llseek(SEEK_HOLE) and llseek(SEEK_DATA) don't require data sync Trond Myklebust @ 2016-07-06 22:30 ` Trond Myklebust 0 siblings, 0 replies; 69+ messages in thread From: Trond Myklebust @ 2016-07-06 22:30 UTC (permalink / raw) To: linux-nfs Prevent filesystem freezes while handling the write page fault. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/file.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 0e9b4a068f13..039d58790629 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -569,6 +569,8 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf) filp, filp->f_mapping->host->i_ino, (long long)page_offset(page)); + sb_start_pagefault(inode->i_sb); + /* make sure the cache has finished storing the page */ nfs_fscache_wait_on_page_write(NFS_I(inode), page); @@ -595,6 +597,7 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf) out_unlock: unlock_page(page); out: + sb_end_pagefault(inode->i_sb); return ret; } -- 2.7.4 ^ permalink raw reply related [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] NFS: Getattr doesn't require data sync semantics 2016-07-06 22:30 ` [PATCH v4 24/28] NFS: Getattr doesn't require data sync semantics Trond Myklebust 2016-07-06 22:30 ` [PATCH v4 25/28] NFSv4.2: Fix a race in nfs42_proc_deallocate() Trond Myklebust @ 2016-07-18 3:48 ` Christoph Hellwig 2016-07-18 4:32 ` Trond Myklebust 1 sibling, 1 reply; 69+ messages in thread From: Christoph Hellwig @ 2016-07-18 3:48 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs On Wed, Jul 06, 2016 at 06:30:01PM -0400, Trond Myklebust wrote: > When retrieving stat() information, NFS unfortunately does require us to > sync writes to disk in order to ensure that mtime and ctime are up to > date. However we shouldn't have to ensure that those writes are persisted. > > Relaxing that requirement does mean that we may see an mtime/ctime change > if the server reboots and forces us to replay all writes. > > The exception to this rule are pNFS clients that are required to send > layoutcommit, however that is dealt with by the call to pnfs_sync_inode() > in _nfs_revalidate_inode(). This one breaks xfstests generic/207 on block/scsi layout for me. The reason for that is that we need a layoutcommit after writing out all data for the file for the file size to be updated on the server. Below is my attempt to fix this by re-adding pnfs_sync_inode to nfs_getattr. The call in _nfs_revalidate_inode isn't enough as it doesn't get called in most cases we care about. diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index 22a53ee..8bd04cf 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -660,11 +660,20 @@ int nfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) int err = 0; trace_nfs_getattr_enter(inode); - /* Flush out writes to the server in order to update c/mtime. */ + + /* + * Flush out writes to the server in order to update c/mtime as well + * as the file size. In the pNFS case this also requires a + * LAYOUTCOMMIT. + */ if (S_ISREG(inode->i_mode)) { err = filemap_write_and_wait(inode->i_mapping); if (err) goto out; + + err = pnfs_sync_inode(inode, true); + if (err) + goto out; } /* ^ permalink raw reply related [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] NFS: Getattr doesn't require data sync semantics 2016-07-18 3:48 ` [PATCH v4 24/28] NFS: Getattr doesn't require data sync semantics Christoph Hellwig @ 2016-07-18 4:32 ` Trond Myklebust 2016-07-18 4:59 ` Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-18 4:32 UTC (permalink / raw) To: Christoph Hellwig; +Cc: List Linux Hi Christoph, > On Jul 17, 2016, at 23:48, Christoph Hellwig <hch@infradead.org> wrote: > > On Wed, Jul 06, 2016 at 06:30:01PM -0400, Trond Myklebust wrote: >> When retrieving stat() information, NFS unfortunately does require us to >> sync writes to disk in order to ensure that mtime and ctime are up to >> date. However we shouldn't have to ensure that those writes are persisted. >> >> Relaxing that requirement does mean that we may see an mtime/ctime change >> if the server reboots and forces us to replay all writes. >> >> The exception to this rule are pNFS clients that are required to send >> layoutcommit, however that is dealt with by the call to pnfs_sync_inode() >> in _nfs_revalidate_inode(). > > This one breaks xfstests generic/207 on block/scsi layout for me. The > reason for that is that we need a layoutcommit after writing out all > data for the file for the file size to be updated on the server. > > Below is my attempt to fix this by re-adding pnfs_sync_inode to > nfs_getattr. The call in _nfs_revalidate_inode isn't enough as it > doesn't get called in most cases we care about. > I’m not understanding this argument. Why do we care if the file size is up to date on the server if we’re not sending an actual GETATTR on the wire to retrieve the file size? Cheers Trond ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] NFS: Getattr doesn't require data sync semantics 2016-07-18 4:32 ` Trond Myklebust @ 2016-07-18 4:59 ` Trond Myklebust 2016-07-19 3:58 ` hch 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-18 4:59 UTC (permalink / raw) To: hch; +Cc: linux-nfs T24gTW9uLCAyMDE2LTA3LTE4IGF0IDAwOjMyIC0wNDAwLCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6 DQo+IEhpIENocmlzdG9waCwNCj4gDQo+ID4gDQo+ID4gT24gSnVsIDE3LCAyMDE2LCBhdCAyMzo0 OCwgQ2hyaXN0b3BoIEhlbGx3aWcgPGhjaEBpbmZyYWRlYWQub3JnPg0KPiA+IHdyb3RlOg0KPiA+ IA0KPiA+IE9uIFdlZCwgSnVsIDA2LCAyMDE2IGF0IDA2OjMwOjAxUE0gLTA0MDAsIFRyb25kIE15 a2xlYnVzdCB3cm90ZToNCj4gPiA+IA0KPiA+ID4gV2hlbiByZXRyaWV2aW5nIHN0YXQoKSBpbmZv cm1hdGlvbiwgTkZTIHVuZm9ydHVuYXRlbHkgZG9lcw0KPiA+ID4gcmVxdWlyZSB1cyB0bw0KPiA+ ID4gc3luYyB3cml0ZXMgdG8gZGlzayBpbiBvcmRlciB0byBlbnN1cmUgdGhhdCBtdGltZSBhbmQg Y3RpbWUgYXJlDQo+ID4gPiB1cCB0bw0KPiA+ID4gZGF0ZS4gSG93ZXZlciB3ZSBzaG91bGRuJ3Qg aGF2ZSB0byBlbnN1cmUgdGhhdCB0aG9zZSB3cml0ZXMgYXJlDQo+ID4gPiBwZXJzaXN0ZWQuDQo+ ID4gPiANCj4gPiA+IFJlbGF4aW5nIHRoYXQgcmVxdWlyZW1lbnQgZG9lcyBtZWFuIHRoYXQgd2Ug bWF5IHNlZSBhbg0KPiA+ID4gbXRpbWUvY3RpbWUgY2hhbmdlDQo+ID4gPiBpZiB0aGUgc2VydmVy IHJlYm9vdHMgYW5kIGZvcmNlcyB1cyB0byByZXBsYXkgYWxsIHdyaXRlcy4NCj4gPiA+IA0KPiA+ ID4gVGhlIGV4Y2VwdGlvbiB0byB0aGlzIHJ1bGUgYXJlIHBORlMgY2xpZW50cyB0aGF0IGFyZSBy ZXF1aXJlZCB0bw0KPiA+ID4gc2VuZA0KPiA+ID4gbGF5b3V0Y29tbWl0LCBob3dldmVyIHRoYXQg aXMgZGVhbHQgd2l0aCBieSB0aGUgY2FsbCB0bw0KPiA+ID4gcG5mc19zeW5jX2lub2RlKCkNCj4g PiA+IGluIF9uZnNfcmV2YWxpZGF0ZV9pbm9kZSgpLg0KPiA+IA0KPiA+IFRoaXMgb25lIGJyZWFr cyB4ZnN0ZXN0cyBnZW5lcmljLzIwNyBvbiBibG9jay9zY3NpIGxheW91dCBmb3INCj4gPiBtZS7C oMKgVGhlDQo+ID4gcmVhc29uIGZvciB0aGF0IGlzIHRoYXQgd2UgbmVlZCBhIGxheW91dGNvbW1p dCBhZnRlciB3cml0aW5nIG91dA0KPiA+IGFsbA0KPiA+IGRhdGEgZm9yIHRoZSBmaWxlIGZvciB0 aGUgZmlsZSBzaXplIHRvIGJlIHVwZGF0ZWQgb24gdGhlIHNlcnZlci4NCj4gPiANCj4gPiBCZWxv dyBpcyBteSBhdHRlbXB0IHRvIGZpeCB0aGlzIGJ5IHJlLWFkZGluZyBwbmZzX3N5bmNfaW5vZGUg dG8NCj4gPiBuZnNfZ2V0YXR0ci7CoMKgVGhlIGNhbGwgaW4gX25mc19yZXZhbGlkYXRlX2lub2Rl IGlzbid0IGVub3VnaCBhcyBpdA0KPiA+IGRvZXNuJ3QgZ2V0IGNhbGxlZCBpbiBtb3N0IGNhc2Vz IHdlIGNhcmUgYWJvdXQuDQo+ID4gDQo+IA0KPiBJ4oCZbSBub3QgdW5kZXJzdGFuZGluZyB0aGlz IGFyZ3VtZW50LiBXaHkgZG8gd2UgY2FyZSBpZiB0aGUgZmlsZSBzaXplDQo+IGlzIHVwIHRvIGRh dGUgb24gdGhlIHNlcnZlciBpZiB3ZeKAmXJlIG5vdCBzZW5kaW5nIGFuIGFjdHVhbCBHRVRBVFRS IG9uDQo+IHRoZSB3aXJlIHRvIHJldHJpZXZlIHRoZSBmaWxlIHNpemU/DQo+IA0KPiBDaGVlcnMN Cj4gwqAgVHJvbmQNCg0KQWN0dWFsbHkuLi4gVGhlIHByb2JsZW0gbWlnaHQgYmUgdGhhdCBhIHBy ZXZpb3VzIGF0dHJpYnV0ZSB1cGRhdGUgaXMNCm1hcmtpbmcgdGhlIGF0dHJpYnV0ZSBjYWNoZSBh cyBiZWluZyByZXZhbGlkYXRlZC4gRG9lcyB0aGUgZm9sbG93aW5nDQpwYXRjaCBoZWxwPw0KDQpD aGVlcnMNCsKgIFRyb25kDQoNCjg8LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0NCkZyb20gMTBiN2U5YWQ0NDg4MWZjZDQ2YWMyNGViNzM3 NDM3N2M2ZTg5NjJlZCBNb24gU2VwIDE3IDAwOjAwOjAwIDIwMDENCkZyb206IFRyb25kIE15a2xl YnVzdCA8dHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbT4NCkRhdGU6IE1vbiwgMTggSnVs IDIwMTYgMDA6NTE6MDEgLTA0MDANClN1YmplY3Q6IFtQQVRDSF0gcE5GUzogRG9uJ3QgbWFyayB0 aGUgaW5vZGUgYXMgcmV2YWxpZGF0ZWQgaWYgYSBMQVlPVVRDT01NSVQNCiBpcyBvdXRzdGFuZGlu Zw0KDQpXZSBrbm93IHRoYXQgdGhlIGF0dHJpYnV0ZXMgd2lsbCBuZWVkIHVwZGF0aW5nIGlmIHRo ZXJlIGlzIHN0aWxsIGENCkxBWU9VVENPTU1JVCBvdXRzdGFuZGluZy4NCg0KUmVwb3J0ZWQtYnk6 IENocmlzdG9waCBIZWxsd2lnIDxoY2hAbHN0LmRlPg0KU2lnbmVkLW9mZi1ieTogVHJvbmQgTXlr bGVidXN0IDx0cm9uZC5teWtsZWJ1c3RAcHJpbWFyeWRhdGEuY29tPg0KLS0tDQogZnMvbmZzL2lu b2RlLmMgfCA1ICsrKystDQogZnMvbmZzL3BuZnMuaCAgfCA3ICsrKysrKysNCiAyIGZpbGVzIGNo YW5nZWQsIDExIGluc2VydGlvbnMoKyksIDEgZGVsZXRpb24oLSkNCg0KZGlmZiAtLWdpdCBhL2Zz L25mcy9pbm9kZS5jIGIvZnMvbmZzL2lub2RlLmMNCmluZGV4IDM1ZmRhMDhkYzRmNi4uOWRmNDU4 MzJlMjhiIDEwMDY0NA0KLS0tIGEvZnMvbmZzL2lub2RlLmMNCisrKyBiL2ZzL25mcy9pbm9kZS5j DQpAQCAtMTY2NCw3ICsxNjY0LDcgQEAgc3RhdGljIGludCBuZnNfdXBkYXRlX2lub2RlKHN0cnVj dCBpbm9kZSAqaW5vZGUsIHN0cnVjdCBuZnNfZmF0dHIgKmZhdHRyKQ0KIAl1bnNpZ25lZCBsb25n IG5vdyA9IGppZmZpZXM7DQogCXVuc2lnbmVkIGxvbmcgc2F2ZV9jYWNoZV92YWxpZGl0eTsNCiAJ Ym9vbCBoYXZlX3dyaXRlcnMgPSBuZnNfZmlsZV9oYXNfYnVmZmVyZWRfd3JpdGVycyhuZnNpKTsN Ci0JYm9vbCBjYWNoZV9yZXZhbGlkYXRlZCA9IHRydWU7DQorCWJvb2wgY2FjaGVfcmV2YWxpZGF0 ZWQ7DQogDQogCWRmcHJpbnRrKFZGUywgIk5GUzogJXMoJXMvJWx1IGZoX2NyYz0weCUwOHggY3Q9 JWQgaW5mbz0weCV4KVxuIiwNCiAJCQlfX2Z1bmNfXywgaW5vZGUtPmlfc2ItPnNfaWQsIGlub2Rl LT5pX2lubywNCkBAIC0xNzEzLDYgKzE3MTMsOSBAQCBzdGF0aWMgaW50IG5mc191cGRhdGVfaW5v ZGUoc3RydWN0IGlub2RlICppbm9kZSwgc3RydWN0IG5mc19mYXR0ciAqZmF0dHIpDQogCS8qIERv IGF0b21pYyB3ZWFrIGNhY2hlIGNvbnNpc3RlbmN5IHVwZGF0ZXMgKi8NCiAJaW52YWxpZCB8PSBu ZnNfd2NjX3VwZGF0ZV9pbm9kZShpbm9kZSwgZmF0dHIpOw0KIA0KKw0KKwljYWNoZV9yZXZhbGlk YXRlZCA9ICFwbmZzX2xheW91dGNvbW1pdF9vdXRzdGFuZGluZyhpbm9kZSk7DQorDQogCS8qIE1v cmUgY2FjaGUgY29uc2lzdGVuY3kgY2hlY2tzICovDQogCWlmIChmYXR0ci0+dmFsaWQgJiBORlNf QVRUUl9GQVRUUl9DSEFOR0UpIHsNCiAJCWlmIChpbm9kZS0+aV92ZXJzaW9uICE9IGZhdHRyLT5j aGFuZ2VfYXR0cikgew0KZGlmZiAtLWdpdCBhL2ZzL25mcy9wbmZzLmggYi9mcy9uZnMvcG5mcy5o DQppbmRleCBkNmJlNTI5OWE1NWEuLjE4MTI4M2M0ZWJjMyAxMDA2NDQNCi0tLSBhL2ZzL25mcy9w bmZzLmgNCisrKyBiL2ZzL25mcy9wbmZzLmgNCkBAIC02MjksNiArNjI5LDEzIEBAIHBuZnNfc3lu Y19pbm9kZShzdHJ1Y3QgaW5vZGUgKmlub2RlLCBib29sIGRhdGFzeW5jKQ0KIH0NCiANCiBzdGF0 aWMgaW5saW5lIGJvb2wNCitwbmZzX2xheW91dGNvbW1pdF9vdXRzdGFuZGluZyhzdHJ1Y3QgaW5v ZGUgKmlub2RlKQ0KK3sNCisJcmV0dXJuIGZhbHNlOw0KK30NCisNCisNCitzdGF0aWMgaW5saW5l IGJvb2wNCiBwbmZzX3JvYyhzdHJ1Y3QgaW5vZGUgKmlubykNCiB7DQogCXJldHVybiBmYWxzZTsN Ci0tIA0KMi43LjQNCg0KDQrCoMKgDQrCoMKgDQoNCg0KDQoNClRyb25kIE15a2xlYnVzdA0KUHJp bmNpcGFsIFN5c3RlbSBBcmNoaXRlY3QNCjQzMDAgRWwgQ2FtaW5vIFJlYWwgfCBTdWl0ZSAxMDAN CkxvcyBBbHRvcywgQ0HCoMKgOTQwMjINClc6IDY1MC00MjItMzgwMA0KQzogODAxLTkyMS00NTgz wqANCnd3dy5wcmltYXJ5ZGF0YS5jb20NCg0KDQoNCg== ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] NFS: Getattr doesn't require data sync semantics 2016-07-18 4:59 ` Trond Myklebust @ 2016-07-19 3:58 ` hch 2016-07-19 20:00 ` [PATCH v4 24/28] " Benjamin Coddington 0 siblings, 1 reply; 69+ messages in thread From: hch @ 2016-07-19 3:58 UTC (permalink / raw) To: Trond Myklebust; +Cc: hch, linux-nfs On Mon, Jul 18, 2016 at 04:59:09AM +0000, Trond Myklebust wrote: > Actually... The problem might be that a previous attribute update is > marking the attribute cache as being revalidated. Does the following > patch help? It doesn't. Also with your most recent linux-next branch the test now cause the systems to OOM with or without your patch (with mine it's still fine). I tested with your writeback branch from about two or three days ago before, and with that + your patch it also 'just fails' and doesn't OOM. Looks like whatever causes the bug also creates a temporarily memory leak when combined with recent changes from your tree, most likely something from the pnfs branch. ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-19 3:58 ` hch @ 2016-07-19 20:00 ` Benjamin Coddington 2016-07-19 20:06 ` Trond Myklebust 2016-07-19 20:09 ` Benjamin Coddington 0 siblings, 2 replies; 69+ messages in thread From: Benjamin Coddington @ 2016-07-19 20:00 UTC (permalink / raw) To: hch; +Cc: Trond Myklebust, linux-nfs On 18 Jul 2016, at 23:58, hch@infradead.org wrote: > On Mon, Jul 18, 2016 at 04:59:09AM +0000, Trond Myklebust wrote: >> Actually... The problem might be that a previous attribute update is >> marking the attribute cache as being revalidated. Does the following >> patch help? > > It doesn't. Also with your most recent linux-next branch the test > now cause the systems to OOM with or without your patch (with mine > it's > still fine). I tested with your writeback branch from about two or > three days ago before, and with that + your patch it also 'just fails' > and doesn't OOM. Looks like whatever causes the bug also creates > a temporarily memory leak when combined with recent changes from your > tree, most likely something from the pnfs branch. I couldn't find the memory leak using kmemleak, but it OOMs pretty quick. If I insert an mdelay(200) just after the lookup_again: marker in pnfs_update_layout() it doesn't OOM, but it seems stuck forever in a loop on that marker: [ 1230.635586] pnfs_find_alloc_layout Begin ino=ffff88003ef986f8 layout=ffff8800392bca58 [ 1230.636729] pnfs_find_lseg:Begin [ 1230.637538] pnfs_find_lseg:Return lseg (null) ref 0 [ 1230.638582] --> send_layoutget [ 1230.639499] --> nfs4_proc_layoutget [ 1230.640525] --> nfs4_layoutget_prepare [ 1230.641479] --> nfs41_setup_sequence [ 1230.641581] <-- nfs4_proc_layoutget status=-512 [ 1230.643288] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=31 [ 1230.644348] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0 [ 1230.645373] <-- nfs41_setup_sequence slotid=0 seqid=4376 [ 1230.646356] <-- nfs4_layoutget_prepare [ 1230.647357] encode_sequence: sessionid=1468956665:2:3:0 seqid=4376 slotid=0 max_slotid=0 cache_this=0 [ 1230.648522] encode_layoutget: 1st type:0x5 iomode:2 off:122880 len:4096 mc:4096 [ 1230.650182] decode_layoutget roff:122880 rlen:4096 riomode:2, lo_type:0x5, lo.len:48 [ 1230.651331] --> nfs4_layoutget_done [ 1230.652233] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=31 [ 1230.653409] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1 [ 1230.654547] nfs4_free_slot: slotid 1 highest_used_slotid 0 [ 1230.655606] nfs41_sequence_done: Error 0 free the slot [ 1230.656635] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 [ 1230.657739] <-- nfs4_layoutget_done [ 1230.658650] --> nfs4_layoutget_release [ 1230.659626] <-- nfs4_layoutget_release This debug output is identical for every cycle of the loop. Have to stop for the day.. more tomorrow. Ben ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-19 20:00 ` [PATCH v4 24/28] " Benjamin Coddington @ 2016-07-19 20:06 ` Trond Myklebust 2016-07-20 15:03 ` Benjamin Coddington 2016-07-19 20:09 ` Benjamin Coddington 1 sibling, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-19 20:06 UTC (permalink / raw) To: Coddington Benjamin; +Cc: hch, List Linux > On Jul 19, 2016, at 16:00, Benjamin Coddington <bcodding@redhat.com> wrote: > > On 18 Jul 2016, at 23:58, hch@infradead.org wrote: > >> On Mon, Jul 18, 2016 at 04:59:09AM +0000, Trond Myklebust wrote: >>> Actually... The problem might be that a previous attribute update is >>> marking the attribute cache as being revalidated. Does the following >>> patch help? >> >> It doesn't. Also with your most recent linux-next branch the test >> now cause the systems to OOM with or without your patch (with mine it's >> still fine). I tested with your writeback branch from about two or >> three days ago before, and with that + your patch it also 'just fails' >> and doesn't OOM. Looks like whatever causes the bug also creates >> a temporarily memory leak when combined with recent changes from your >> tree, most likely something from the pnfs branch. > > I couldn't find the memory leak using kmemleak, but it OOMs pretty quick. If I > insert an mdelay(200) just after the lookup_again: marker in > pnfs_update_layout() it doesn't OOM, but it seems stuck forever in a loop on > that marker: > > [ 1230.635586] pnfs_find_alloc_layout Begin ino=ffff88003ef986f8 layout=ffff8800392bca58 > [ 1230.636729] pnfs_find_lseg:Begin > [ 1230.637538] pnfs_find_lseg:Return lseg (null) ref 0 > [ 1230.638582] --> send_layoutget > [ 1230.639499] --> nfs4_proc_layoutget > [ 1230.640525] --> nfs4_layoutget_prepare > [ 1230.641479] --> nfs41_setup_sequence > [ 1230.641581] <-- nfs4_proc_layoutget status=-512 > [ 1230.643288] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=31 > [ 1230.644348] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0 > [ 1230.645373] <-- nfs41_setup_sequence slotid=0 seqid=4376 > [ 1230.646356] <-- nfs4_layoutget_prepare > [ 1230.647357] encode_sequence: sessionid=1468956665:2:3:0 seqid=4376 slotid=0 max_slotid=0 cache_this=0 > [ 1230.648522] encode_layoutget: 1st type:0x5 iomode:2 off:122880 len:4096 mc:4096 > [ 1230.650182] decode_layoutget roff:122880 rlen:4096 riomode:2, lo_type:0x5, lo.len:48 > [ 1230.651331] --> nfs4_layoutget_done > [ 1230.652233] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=31 > [ 1230.653409] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1 > [ 1230.654547] nfs4_free_slot: slotid 1 highest_used_slotid 0 > [ 1230.655606] nfs41_sequence_done: Error 0 free the slot > [ 1230.656635] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 > [ 1230.657739] <-- nfs4_layoutget_done > [ 1230.658650] --> nfs4_layoutget_release > [ 1230.659626] <-- nfs4_layoutget_release > > This debug output is identical for every cycle of the loop. Have to stop for the > day.. more tomorrow. > > Ben > Duh… It’s this patch: pNFS: Fix post-layoutget error handling in pnfs_update_layout() We have to pass through fatal errors… I’ll fix it. Cheers Trond ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-19 20:06 ` Trond Myklebust @ 2016-07-20 15:03 ` Benjamin Coddington 2016-07-21 8:22 ` hch 0 siblings, 1 reply; 69+ messages in thread From: Benjamin Coddington @ 2016-07-20 15:03 UTC (permalink / raw) To: Trond Myklebust; +Cc: hch, List Linux On 19 Jul 2016, at 16:06, Trond Myklebust wrote: >> On Jul 19, 2016, at 16:00, Benjamin Coddington <bcodding@redhat.com> >> wrote: >> >> On 18 Jul 2016, at 23:58, hch@infradead.org wrote: >> >>> On Mon, Jul 18, 2016 at 04:59:09AM +0000, Trond Myklebust wrote: >>>> Actually... The problem might be that a previous attribute update >>>> is >>>> marking the attribute cache as being revalidated. Does the >>>> following >>>> patch help? >>> >>> It doesn't. Also with your most recent linux-next branch the test >>> now cause the systems to OOM with or without your patch (with mine >>> it's >>> still fine). I tested with your writeback branch from about two or >>> three days ago before, and with that + your patch it also 'just >>> fails' >>> and doesn't OOM. Looks like whatever causes the bug also creates >>> a temporarily memory leak when combined with recent changes from >>> your >>> tree, most likely something from the pnfs branch. >> >> I couldn't find the memory leak using kmemleak, but it OOMs pretty >> quick. If I >> insert an mdelay(200) just after the lookup_again: marker in >> pnfs_update_layout() it doesn't OOM, but it seems stuck forever in a >> loop on >> that marker: >> >> [ 1230.635586] pnfs_find_alloc_layout Begin ino=ffff88003ef986f8 >> layout=ffff8800392bca58 >> [ 1230.636729] pnfs_find_lseg:Begin >> [ 1230.637538] pnfs_find_lseg:Return lseg (null) ref 0 >> [ 1230.638582] --> send_layoutget >> [ 1230.639499] --> nfs4_proc_layoutget >> [ 1230.640525] --> nfs4_layoutget_prepare >> [ 1230.641479] --> nfs41_setup_sequence >> [ 1230.641581] <-- nfs4_proc_layoutget status=-512 >> [ 1230.643288] --> nfs4_alloc_slot used_slots=0000 >> highest_used=4294967295 max_slots=31 >> [ 1230.644348] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 >> slotid=0 >> [ 1230.645373] <-- nfs41_setup_sequence slotid=0 seqid=4376 >> [ 1230.646356] <-- nfs4_layoutget_prepare >> [ 1230.647357] encode_sequence: sessionid=1468956665:2:3:0 seqid=4376 >> slotid=0 max_slotid=0 cache_this=0 >> [ 1230.648522] encode_layoutget: 1st type:0x5 iomode:2 off:122880 >> len:4096 mc:4096 >> [ 1230.650182] decode_layoutget roff:122880 rlen:4096 riomode:2, >> lo_type:0x5, lo.len:48 >> [ 1230.651331] --> nfs4_layoutget_done >> [ 1230.652233] --> nfs4_alloc_slot used_slots=0001 highest_used=0 >> max_slots=31 >> [ 1230.653409] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 >> slotid=1 >> [ 1230.654547] nfs4_free_slot: slotid 1 highest_used_slotid 0 >> [ 1230.655606] nfs41_sequence_done: Error 0 free the slot >> [ 1230.656635] nfs4_free_slot: slotid 0 highest_used_slotid >> 4294967295 >> [ 1230.657739] <-- nfs4_layoutget_done >> [ 1230.658650] --> nfs4_layoutget_release >> [ 1230.659626] <-- nfs4_layoutget_release >> >> This debug output is identical for every cycle of the loop. Have to >> stop for the >> day.. more tomorrow. >> >> Ben >> > > Duh… It’s this patch: pNFS: Fix post-layoutget error handling in > pnfs_update_layout() > We have to pass through fatal errors… I’ll fix it. That's indeed fixed it up, and generic/207 passes now. Thanks! Ben ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-20 15:03 ` Benjamin Coddington @ 2016-07-21 8:22 ` hch 2016-07-21 8:32 ` Benjamin Coddington 0 siblings, 1 reply; 69+ messages in thread From: hch @ 2016-07-21 8:22 UTC (permalink / raw) To: Benjamin Coddington; +Cc: Trond Myklebust, hch, List Linux On Wed, Jul 20, 2016 at 11:03:06AM -0400, Benjamin Coddington wrote: > > > This debug output is identical for every cycle of the loop. Have to > > > stop for the > > > day.. more tomorrow. > > > > > > Ben > > > > > > > Duh??? It???s this patch: pNFS: Fix post-layoutget error handling in > > pnfs_update_layout() > > We have to pass through fatal errors??? I???ll fix it. > > That's indeed fixed it up, and generic/207 passes now. Thanks! So I spoke too soon in my last mail, generic/207 still fails for me with Trond's linux-next tree, although much later in the test now. Does it include the changes that are supposed to fix the issue? ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-21 8:22 ` hch @ 2016-07-21 8:32 ` Benjamin Coddington 2016-07-21 9:10 ` Benjamin Coddington 0 siblings, 1 reply; 69+ messages in thread From: Benjamin Coddington @ 2016-07-21 8:32 UTC (permalink / raw) To: hch; +Cc: Trond Myklebust, List Linux On 21 Jul 2016, at 4:22, hch@infradead.org wrote: > On Wed, Jul 20, 2016 at 11:03:06AM -0400, Benjamin Coddington wrote: >>>> This debug output is identical for every cycle of the loop. Have to >>>> stop for the >>>> day.. more tomorrow. >>>> >>>> Ben >>>> >>> >>> Duh??? It???s this patch: pNFS: Fix post-layoutget error handling in >>> pnfs_update_layout() >>> We have to pass through fatal errors??? I???ll fix it. >> >> That's indeed fixed it up, and generic/207 passes now. Thanks! > > So I spoke too soon in my last mail, generic/207 still fails for me > with Trond's linux-next tree, although much later in the test now. > > Does it include the changes that are supposed to fix the issue? It should -- the v2 that fixed 207 for me is 56b38a1f7c781519eef09c1668a3c97ea911f86b, the first version was e35c2a0b3cd062a8941d21511719391b64437427, I think. I'll test again too. Ben ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-21 8:32 ` Benjamin Coddington @ 2016-07-21 9:10 ` Benjamin Coddington 2016-07-21 9:52 ` Benjamin Coddington 0 siblings, 1 reply; 69+ messages in thread From: Benjamin Coddington @ 2016-07-21 9:10 UTC (permalink / raw) To: hch; +Cc: Trond Myklebust, List Linux On 21 Jul 2016, at 4:32, Benjamin Coddington wrote: > On 21 Jul 2016, at 4:22, hch@infradead.org wrote: > >> On Wed, Jul 20, 2016 at 11:03:06AM -0400, Benjamin Coddington wrote: >>>>> This debug output is identical for every cycle of the loop. Have >>>>> to >>>>> stop for the >>>>> day.. more tomorrow. >>>>> >>>>> Ben >>>>> >>>> >>>> Duh??? It???s this patch: pNFS: Fix post-layoutget error handling >>>> in >>>> pnfs_update_layout() >>>> We have to pass through fatal errors??? I???ll fix it. >>> >>> That's indeed fixed it up, and generic/207 passes now. Thanks! >> >> So I spoke too soon in my last mail, generic/207 still fails for me >> with Trond's linux-next tree, although much later in the test now. >> >> Does it include the changes that are supposed to fix the issue? > > It should -- the v2 that fixed 207 for me is > 56b38a1f7c781519eef09c1668a3c97ea911f86b, the first version was > e35c2a0b3cd062a8941d21511719391b64437427, I think. I'll test again > too. Looks like we're back to the original problem - it fails with the inode size is 4k less than expected. The reason it worked for me was I had pnfs_ld debugging turned up which slowed things down enough to somehow catch the right size. Looks like the right size is returned in the CLOSE, but the inode's not getting updated. Ben ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-21 9:10 ` Benjamin Coddington @ 2016-07-21 9:52 ` Benjamin Coddington 2016-07-21 12:46 ` Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Benjamin Coddington @ 2016-07-21 9:52 UTC (permalink / raw) To: hch; +Cc: Trond Myklebust, List Linux On 21 Jul 2016, at 5:10, Benjamin Coddington wrote: > On 21 Jul 2016, at 4:32, Benjamin Coddington wrote: > >> On 21 Jul 2016, at 4:22, hch@infradead.org wrote: >> >>> On Wed, Jul 20, 2016 at 11:03:06AM -0400, Benjamin Coddington wrote: >>>>>> This debug output is identical for every cycle of the loop. Have >>>>>> to >>>>>> stop for the >>>>>> day.. more tomorrow. >>>>>> >>>>>> Ben >>>>>> >>>>> >>>>> Duh??? It???s this patch: pNFS: Fix post-layoutget error handling >>>>> in >>>>> pnfs_update_layout() >>>>> We have to pass through fatal errors??? I???ll fix it. >>>> >>>> That's indeed fixed it up, and generic/207 passes now. Thanks! >>> >>> So I spoke too soon in my last mail, generic/207 still fails for me >>> with Trond's linux-next tree, although much later in the test now. >>> >>> Does it include the changes that are supposed to fix the issue? >> >> It should -- the v2 that fixed 207 for me is >> 56b38a1f7c781519eef09c1668a3c97ea911f86b, the first version was >> e35c2a0b3cd062a8941d21511719391b64437427, I think. I'll test again >> too. > > Looks like we're back to the original problem - it fails with the > inode size is 4k less than expected. > > The reason it worked for me was I had pnfs_ld debugging turned up > which slowed things down enough to somehow catch the right size. > > Looks like the right size is returned in the CLOSE, but the inode's > not getting updated. And the size is right in the last LAYOUTCOMMIT response, of course. Is this the problem? 6024 static int decode_layoutcommit(struct xdr_stream *xdr, ... 6040 sizechanged = be32_to_cpup(p); 6041 6042 if (sizechanged) { 6043 /* throw away new size */ 6044 p = xdr_inline_decode(xdr, 8); Ben ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-21 9:52 ` Benjamin Coddington @ 2016-07-21 12:46 ` Trond Myklebust 2016-07-21 13:05 ` Benjamin Coddington 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-21 12:46 UTC (permalink / raw) To: Coddington Benjamin; +Cc: hch, List Linux > On Jul 21, 2016, at 05:52, Benjamin Coddington <bcodding@redhat.com> wrot= e: >=20 > On 21 Jul 2016, at 5:10, Benjamin Coddington wrote: >=20 >> On 21 Jul 2016, at 4:32, Benjamin Coddington wrote: >>=20 >>> On 21 Jul 2016, at 4:22, hch@infradead.org wrote: >>>=20 >>>> On Wed, Jul 20, 2016 at 11:03:06AM -0400, Benjamin Coddington wrote: >>>>>>> This debug output is identical for every cycle of the loop. Have to >>>>>>> stop for the >>>>>>> day.. more tomorrow. >>>>>>>=20 >>>>>>> Ben >>>>>>>=20 >>>>>>=20 >>>>>> Duh??? It???s this patch: pNFS: Fix post-layoutget error handling in >>>>>> pnfs_update_layout() >>>>>> We have to pass through fatal errors??? I???ll fix it. >>>>>=20 >>>>> That's indeed fixed it up, and generic/207 passes now. Thanks! >>>>=20 >>>> So I spoke too soon in my last mail, generic/207 still fails for me >>>> with Trond's linux-next tree, although much later in the test now. >>>>=20 >>>> Does it include the changes that are supposed to fix the issue? >>>=20 >>> It should -- the v2 that fixed 207 for me is 56b38a1f7c781519eef09c1668= a3c97ea911f86b, the first version was e35c2a0b3cd062a8941d21511719391b64437= 427, I think. I'll test again too. >>=20 >> Looks like we're back to the original problem - it fails with the inode = size is 4k less than expected. >>=20 >> The reason it worked for me was I had pnfs_ld debugging turned up which = slowed things down enough to somehow catch the right size. >>=20 >> Looks like the right size is returned in the CLOSE, but the inode's not = getting updated. >=20 > And the size is right in the last LAYOUTCOMMIT response, of course. Is t= his the problem? >=20 > 6024 static int decode_layoutcommit(struct xdr_stream *xdr, > ... > 6040 sizechanged =3D be32_to_cpup(p); > 6041 > 6042 if (sizechanged) { > 6043 /* throw away new size */ > 6044 p =3D xdr_inline_decode(xdr, 8); >=20 >=20 > Ben >=20 That shouldn=92t really matter since we have a GETATTR immediately followin= g the LAYOUTGET operation. Assuming that nfs4_proc_layoutcommit() actually = gets called, it is supposed to update the inode correctly. Cheers Trond ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-21 12:46 ` Trond Myklebust @ 2016-07-21 13:05 ` Benjamin Coddington 2016-07-21 13:20 ` Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Benjamin Coddington @ 2016-07-21 13:05 UTC (permalink / raw) To: Trond Myklebust; +Cc: hch, List Linux On 21 Jul 2016, at 8:46, Trond Myklebust wrote: >> On Jul 21, 2016, at 05:52, Benjamin Coddington <bcodding@redhat.com> >> wrote: >> >> On 21 Jul 2016, at 5:10, Benjamin Coddington wrote: >> >>> On 21 Jul 2016, at 4:32, Benjamin Coddington wrote: >>> >>>> On 21 Jul 2016, at 4:22, hch@infradead.org wrote: >>>> >>>>> On Wed, Jul 20, 2016 at 11:03:06AM -0400, Benjamin Coddington >>>>> wrote: >>>>>>>> This debug output is identical for every cycle of the loop. >>>>>>>> Have to >>>>>>>> stop for the >>>>>>>> day.. more tomorrow. >>>>>>>> >>>>>>>> Ben >>>>>>>> >>>>>>> >>>>>>> Duh??? It???s this patch: pNFS: Fix post-layoutget error >>>>>>> handling in >>>>>>> pnfs_update_layout() >>>>>>> We have to pass through fatal errors??? I???ll fix it. >>>>>> >>>>>> That's indeed fixed it up, and generic/207 passes now. Thanks! >>>>> >>>>> So I spoke too soon in my last mail, generic/207 still failjs for >>>>> me >>>>> with Trond's linux-next tree, although much later in the test now. >>>>> >>>>> Does it include the changes that are supposed to fix the issue? >>>> >>>> It should -- the v2 that fixed 207 for me is >>>> 56b38a1f7c781519eef09c1668a3c97ea911f86b, the first version was >>>> e35c2a0b3cd062a8941d21511719391b64437427, I think. I'll test again >>>> too. >>> >>> Looks like we're back to the original problem - it fails with the >>> inode size is 4k less than expected. >>> >>> The reason it worked for me was I had pnfs_ld debugging turned up >>> which slowed things down enough to somehow catch the right size. >>> >>> Looks like the right size is returned in the CLOSE, but the inode's >>> not getting updated. >> >> And the size is right in the last LAYOUTCOMMIT response, of course. >> Is this the problem? >> >> 6024 static int decode_layoutcommit(struct xdr_stream *xdr, >> ... >> 6040 sizechanged = be32_to_cpup(p); >> 6041 >> 6042 if (sizechanged) { >> 6043 /* throw away new size */ >> 6044 p = xdr_inline_decode(xdr, 8); >> >> >> Ben >> > > That shouldn’t really matter since we have a GETATTR immediately > following the LAYOUTGET operation. Assuming that > nfs4_proc_layoutcommit() actually gets called, it is supposed to > update the inode correctly. So back to Christoph's point earlier: On 17 Jul 2016, at 23:48, Christoph Hellwig wrote: > This one breaks xfstests generic/207 on block/scsi layout for me. The > reason for that is that we need a layoutcommit after writing out all > data for the file for the file size to be updated on the server. You responded: On 18 Jul 2016, at 0:32, Trond Myklebust wrote: > I’m not understanding this argument. Why do we care if the file size > is up > to date on the server if we’re not sending an actual GETATTR on the > wire > to retrieve the file size? I guess the answer might be because we can get it back from the last LAYOUTCOMMIT. This test has repeated appending 4k and has this pattern on the wire: NFS 334 V4 Call LAYOUTGET NFS 290 V4 Reply (Call In 854) LAYOUTCOMMIT NFS 294 V4 Call GETATTR FH: 0x4f5528b0 NFS 442 V4 Reply (Call In 858) GETATTR NFS 374 V4 Call LAYOUTCOMMIT NFS 314 V4 Reply (Call In 856) LAYOUTGET NFS 334 V4 Call LAYOUTGET NFS 290 V4 Reply (Call In 860) LAYOUTCOMMIT NFS 294 V4 Call GETATTR FH: 0x4f5528b0 NFS 442 V4 Reply (Call In 864) GETATTR NFS 374 V4 Call LAYOUTCOMMIT NFS 314 V4 Reply (Call In 862) LAYOUTGET NFS 334 V4 Call LAYOUTGET NFS 290 V4 Reply (Call In 866) LAYOUTCOMMIT NFS 294 V4 Call GETATTR FH: 0x4f5528b0 NFS 442 V4 Reply (Call In 870) GETATTR NFS 314 V4 Reply (Call In 868) LAYOUTGET NFS 374 V4 Call LAYOUTCOMMIT NFS 290 V4 Reply (Call In 874) LAYOUTCOMMIT NFS 314 V4 Call CLOSE StateID: 0x54d9 NFS 294 V4 Reply (Call In 876) CLOSE That last LAYOUTCOMMIT and the CLOSE have the size we want. The previous GETATTR is 4k short. Ben ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-21 13:05 ` Benjamin Coddington @ 2016-07-21 13:20 ` Trond Myklebust 2016-07-21 14:00 ` Trond Myklebust ` (2 more replies) 0 siblings, 3 replies; 69+ messages in thread From: Trond Myklebust @ 2016-07-21 13:20 UTC (permalink / raw) To: Coddington Benjamin; +Cc: hch, List Linux DQo+IE9uIEp1bCAyMSwgMjAxNiwgYXQgMDk6MDUsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2Rk aW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPiANCj4gU28gYmFjayB0byBDaHJpc3RvcGgncyBwb2lu dCBlYXJsaWVyOg0KPiANCj4gT24gMTcgSnVsIDIwMTYsIGF0IDIzOjQ4LCBDaHJpc3RvcGggSGVs bHdpZyB3cm90ZToNCj4+IFRoaXMgb25lIGJyZWFrcyB4ZnN0ZXN0cyBnZW5lcmljLzIwNyBvbiBi bG9jay9zY3NpIGxheW91dCBmb3IgbWUuICBUaGUNCj4+IHJlYXNvbiBmb3IgdGhhdCBpcyB0aGF0 IHdlIG5lZWQgYSBsYXlvdXRjb21taXQgYWZ0ZXIgd3JpdGluZyBvdXQgYWxsDQo+PiBkYXRhIGZv ciB0aGUgZmlsZSBmb3IgdGhlIGZpbGUgc2l6ZSB0byBiZSB1cGRhdGVkIG9uIHRoZSBzZXJ2ZXIu DQo+IA0KPiBZb3UgcmVzcG9uZGVkOg0KPiANCj4gT24gMTggSnVsIDIwMTYsIGF0IDA6MzIsIFRy b25kIE15a2xlYnVzdCB3cm90ZToNCj4+IEnigJltIG5vdCB1bmRlcnN0YW5kaW5nIHRoaXMgYXJn dW1lbnQuIFdoeSBkbyB3ZSBjYXJlIGlmIHRoZSBmaWxlIHNpemUgaXMgdXANCj4+IHRvIGRhdGUg b24gdGhlIHNlcnZlciBpZiB3ZeKAmXJlIG5vdCBzZW5kaW5nIGFuIGFjdHVhbCBHRVRBVFRSIG9u IHRoZSB3aXJlDQo+PiB0byByZXRyaWV2ZSB0aGUgZmlsZSBzaXplPw0KPiANCj4gSSBndWVzcyB0 aGUgYW5zd2VyIG1pZ2h0IGJlIGJlY2F1c2Ugd2UgY2FuIGdldCBpdCBiYWNrIGZyb20gdGhlIGxh c3QNCj4gTEFZT1VUQ09NTUlULg0KPiANCg0KVGhlIHBhdGNoIHRoYXQgSSBmb2xsb3dlZCB1cCB3 aXRoIHNob3VsZCBub3cgZW5zdXJlIHRoYXQgd2UgZG8gbm90IG1hcmsgdGhlIGF0dHJpYnV0ZSBj YWNoZSBhcyB1cCB0byBkYXRlIGlmIHRoZXJlIGlzIGEgTEFZT1VUQ09NTUlUIHBlbmRpbmcuDQpJ T1c6IHdoZW4gdGhlIHBORlMgd3JpdGUgaXMgZG9uZSwgaXQgaXMgZXhwZWN0ZWQgdG8gZG8gMiB0 aGluZ3M6DQoNCjEpIG1hcmsgdGhlIGlub2RlIGZvciBMQVlPVVRDT01NSVQNCjIpIG1hcmsgdGhl IGF0dHJpYnV0ZSBjYWNoZSBhcyBpbnZhbGlkIChiZWNhdXNlIHdlIGtub3cgdGhlIGNoYW5nZSBh dHRyaWJ1dGUsIG10aW1lLCBjdGltZSBuZWVkIHRvIGJlIHVwZGF0ZXMpDQoNCkluIHRoZSBjYXNl IG9mIGJsb2NrcyBwTkZTIHdyaXRlOg0KVGhlIGNhbGwgdG8gcG5mc19zZXRfbGF5b3V0Y29tbWl0 KCkgaW4gcG5mc19sZF93cml0ZV9kb25lKCkgc2hvdWxkIHRha2UgY2FyZSBvZiAoMSkNClRoZSBj YWxsIHRvIG5mc193cml0ZWJhY2tfdXBkYXRlX2lub2RlKCkgaW4gbmZzNF93cml0ZV9kb25lX2Ni KCkgc2hvdWxkIHRha2UgY2FyZSBvZiAoMikuDQoNClByb3ZpZGVkIHRoYXQgdGhlc2UgMiBjYWxs cyBhcmUgcGVyZm9ybWVkIGluIHRoZSBhYm92ZSBvcmRlciwgdGhlbiBhbnkgY2FsbCB0byBuZnNf Z2V0YXR0cigpIHdoaWNoIGhhcyBub3QgYmVlbiBwcmVjZWRlZCBieSBhIGNhbGwgdG8gbmZzNF9w cm9jX2xheW91dGNvbW1pdCgpIHNob3VsZCB0cmlnZ2VyIHRoZSBjYWxsIHRvIF9fbmZzX3JldmFs aWRhdGVfaW5vZGUoKS4NCg0KPiBUaGlzIHRlc3QgaGFzIHJlcGVhdGVkIGFwcGVuZGluZyA0ayBh bmQgaGFzIHRoaXMgcGF0dGVybiBvbiB0aGUgd2lyZToNCj4gDQo+IE5GUyAzMzQgVjQgQ2FsbCBM QVlPVVRHRVQNCj4gTkZTIDI5MCBWNCBSZXBseSAoQ2FsbCBJbiA4NTQpIExBWU9VVENPTU1JVA0K PiBORlMgMjk0IFY0IENhbGwgR0VUQVRUUiBGSDogMHg0ZjU1MjhiMA0KPiBORlMgNDQyIFY0IFJl cGx5IChDYWxsIEluIDg1OCkgR0VUQVRUUg0KPiBORlMgMzc0IFY0IENhbGwgTEFZT1VUQ09NTUlU DQo+IE5GUyAzMTQgVjQgUmVwbHkgKENhbGwgSW4gODU2KSBMQVlPVVRHRVQNCj4gTkZTIDMzNCBW NCBDYWxsIExBWU9VVEdFVA0KPiBORlMgMjkwIFY0IFJlcGx5IChDYWxsIEluIDg2MCkgTEFZT1VU Q09NTUlUDQo+IE5GUyAyOTQgVjQgQ2FsbCBHRVRBVFRSIEZIOiAweDRmNTUyOGIwDQo+IE5GUyA0 NDIgVjQgUmVwbHkgKENhbGwgSW4gODY0KSBHRVRBVFRSDQo+IE5GUyAzNzQgVjQgQ2FsbCBMQVlP VVRDT01NSVQNCj4gTkZTIDMxNCBWNCBSZXBseSAoQ2FsbCBJbiA4NjIpIExBWU9VVEdFVA0KPiBO RlMgMzM0IFY0IENhbGwgTEFZT1VUR0VUDQo+IE5GUyAyOTAgVjQgUmVwbHkgKENhbGwgSW4gODY2 KSBMQVlPVVRDT01NSVQNCj4gTkZTIDI5NCBWNCBDYWxsIEdFVEFUVFIgRkg6IDB4NGY1NTI4YjAN Cj4gTkZTIDQ0MiBWNCBSZXBseSAoQ2FsbCBJbiA4NzApIEdFVEFUVFINCj4gTkZTIDMxNCBWNCBS ZXBseSAoQ2FsbCBJbiA4NjgpIExBWU9VVEdFVA0KPiBORlMgMzc0IFY0IENhbGwgTEFZT1VUQ09N TUlUDQo+IE5GUyAyOTAgVjQgUmVwbHkgKENhbGwgSW4gODc0KSBMQVlPVVRDT01NSVQNCj4gTkZT IDMxNCBWNCBDYWxsIENMT1NFIFN0YXRlSUQ6IDB4NTRkOQ0KPiBORlMgMjk0IFY0IFJlcGx5IChD YWxsIEluIDg3NikgQ0xPU0UNCj4gDQo+IFRoYXQgbGFzdCBMQVlPVVRDT01NSVQgYW5kIHRoZSBD TE9TRSBoYXZlIHRoZSBzaXplIHdlIHdhbnQuICBUaGUgcHJldmlvdXMNCj4gR0VUQVRUUiBpcyA0 ayBzaG9ydC4NCg0KV2hlbiB5b3Ugc2F5IOKAnDRrIHNob3J04oCdLCBkbyB5b3UgbWVhbiB0aGF0 IGl0IGRpZmZlcnMgZnJvbSB0aGUgdmFsdWUgcmV0dXJuZWQgYnkgdGhlIExBWU9VVENPTU1JVCB0 aGF0IGltbWVkaWF0ZWx5IHByZWNlZGVzIGl0PyBJdCBsb29rcyBhcyBpZiB0aGVyZSBpcyBhIExB WU9VVEdFVCBpbW1lZGlhdGVseSBmb2xsb3dpbmcgaXQsIHNvIHByZXN1bWFibHkgdGhlIHdyaXRl cyBoYXZlIG5vdCBhbGwgY29tcGxldGVkIHlldC4NCg0KQ2hlZXJzDQogIFRyb25kDQoNCg== ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-21 13:20 ` Trond Myklebust @ 2016-07-21 14:00 ` Trond Myklebust 2016-07-21 14:02 ` Benjamin Coddington 2016-07-25 16:26 ` Benjamin Coddington 2 siblings, 0 replies; 69+ messages in thread From: Trond Myklebust @ 2016-07-21 14:00 UTC (permalink / raw) To: Coddington Benjamin; +Cc: hch, List Linux DQo+IE9uIEp1bCAyMSwgMjAxNiwgYXQgMDk6MjAsIFRyb25kIE15a2xlYnVzdCA8dHJvbmRteUBw cmltYXJ5ZGF0YS5jb20+IHdyb3RlOg0KPiANCj4gDQo+PiBPbiBKdWwgMjEsIDIwMTYsIGF0IDA5 OjA1LCBCZW5qYW1pbiBDb2RkaW5ndG9uIDxiY29kZGluZ0ByZWRoYXQuY29tPiB3cm90ZToNCj4+ IA0KPj4gU28gYmFjayB0byBDaHJpc3RvcGgncyBwb2ludCBlYXJsaWVyOg0KPj4gDQo+PiBPbiAx NyBKdWwgMjAxNiwgYXQgMjM6NDgsIENocmlzdG9waCBIZWxsd2lnIHdyb3RlOg0KPj4+IFRoaXMg b25lIGJyZWFrcyB4ZnN0ZXN0cyBnZW5lcmljLzIwNyBvbiBibG9jay9zY3NpIGxheW91dCBmb3Ig bWUuICBUaGUNCj4+PiByZWFzb24gZm9yIHRoYXQgaXMgdGhhdCB3ZSBuZWVkIGEgbGF5b3V0Y29t bWl0IGFmdGVyIHdyaXRpbmcgb3V0IGFsbA0KPj4+IGRhdGEgZm9yIHRoZSBmaWxlIGZvciB0aGUg ZmlsZSBzaXplIHRvIGJlIHVwZGF0ZWQgb24gdGhlIHNlcnZlci4NCj4+IA0KPj4gWW91IHJlc3Bv bmRlZDoNCj4+IA0KPj4gT24gMTggSnVsIDIwMTYsIGF0IDA6MzIsIFRyb25kIE15a2xlYnVzdCB3 cm90ZToNCj4+PiBJ4oCZbSBub3QgdW5kZXJzdGFuZGluZyB0aGlzIGFyZ3VtZW50LiBXaHkgZG8g d2UgY2FyZSBpZiB0aGUgZmlsZSBzaXplIGlzIHVwDQo+Pj4gdG8gZGF0ZSBvbiB0aGUgc2VydmVy IGlmIHdl4oCZcmUgbm90IHNlbmRpbmcgYW4gYWN0dWFsIEdFVEFUVFIgb24gdGhlIHdpcmUNCj4+ PiB0byByZXRyaWV2ZSB0aGUgZmlsZSBzaXplPw0KPj4gDQo+PiBJIGd1ZXNzIHRoZSBhbnN3ZXIg bWlnaHQgYmUgYmVjYXVzZSB3ZSBjYW4gZ2V0IGl0IGJhY2sgZnJvbSB0aGUgbGFzdA0KPj4gTEFZ T1VUQ09NTUlULg0KPj4gDQo+IA0KPiBUaGUgcGF0Y2ggdGhhdCBJIGZvbGxvd2VkIHVwIHdpdGgg c2hvdWxkIG5vdyBlbnN1cmUgdGhhdCB3ZSBkbyBub3QgbWFyayB0aGUgYXR0cmlidXRlIGNhY2hl IGFzIHVwIHRvIGRhdGUgaWYgdGhlcmUgaXMgYSBMQVlPVVRDT01NSVQgcGVuZGluZy4NCj4gSU9X OiB3aGVuIHRoZSBwTkZTIHdyaXRlIGlzIGRvbmUsIGl0IGlzIGV4cGVjdGVkIHRvIGRvIDIgdGhp bmdzOg0KPiANCj4gMSkgbWFyayB0aGUgaW5vZGUgZm9yIExBWU9VVENPTU1JVA0KPiAyKSBtYXJr IHRoZSBhdHRyaWJ1dGUgY2FjaGUgYXMgaW52YWxpZCAoYmVjYXVzZSB3ZSBrbm93IHRoZSBjaGFu Z2UgYXR0cmlidXRlLCBtdGltZSwgY3RpbWUgbmVlZCB0byBiZSB1cGRhdGVzKQ0KPiANCj4gSW4g dGhlIGNhc2Ugb2YgYmxvY2tzIHBORlMgd3JpdGU6DQo+IFRoZSBjYWxsIHRvIHBuZnNfc2V0X2xh eW91dGNvbW1pdCgpIGluIHBuZnNfbGRfd3JpdGVfZG9uZSgpIHNob3VsZCB0YWtlIGNhcmUgb2Yg KDEpDQo+IFRoZSBjYWxsIHRvIG5mc193cml0ZWJhY2tfdXBkYXRlX2lub2RlKCkgaW4gbmZzNF93 cml0ZV9kb25lX2NiKCkgc2hvdWxkIHRha2UgY2FyZSBvZiAoMikuDQo+IA0KPiBQcm92aWRlZCB0 aGF0IHRoZXNlIDIgY2FsbHMgYXJlIHBlcmZvcm1lZCBpbiB0aGUgYWJvdmUgb3JkZXIsIHRoZW4g YW55IGNhbGwgdG8gbmZzX2dldGF0dHIoKSB3aGljaCBoYXMgbm90IGJlZW4gcHJlY2VkZWQgYnkg YSBjYWxsIHRvIG5mczRfcHJvY19sYXlvdXRjb21taXQoKSBzaG91bGQgdHJpZ2dlciB0aGUgY2Fs bCB0byBfX25mc19yZXZhbGlkYXRlX2lub2RlKCkuDQoNCkJ5IHRoZSB3YXksIGl0IGxvb2tzIGFz IGlmIHRoZSDigJhmaWxlc+KAmSBsYXlvdXQgdHlwZSBmYWlscyB0byBkbyAoMikuIEnigJlsbCBh ZGQgYSBmaXggZm9yIHRoYXQuDQoNCkNoZWVycw0KICBUcm9uZA== ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-21 13:20 ` Trond Myklebust 2016-07-21 14:00 ` Trond Myklebust @ 2016-07-21 14:02 ` Benjamin Coddington 2016-07-25 16:26 ` Benjamin Coddington 2 siblings, 0 replies; 69+ messages in thread From: Benjamin Coddington @ 2016-07-21 14:02 UTC (permalink / raw) To: Trond Myklebust; +Cc: hch, List Linux On 21 Jul 2016, at 9:20, Trond Myklebust wrote: >> On Jul 21, 2016, at 09:05, Benjamin Coddington <bcodding@redhat.com> >> wrote: >> >> So back to Christoph's point earlier: >> >> On 17 Jul 2016, at 23:48, Christoph Hellwig wrote: >>> This one breaks xfstests generic/207 on block/scsi layout for me. >>> The >>> reason for that is that we need a layoutcommit after writing out all >>> data for the file for the file size to be updated on the server. >> >> You responded: >> >> On 18 Jul 2016, at 0:32, Trond Myklebust wrote: >>> I’m not understanding this argument. Why do we care if the file >>> size is up >>> to date on the server if we’re not sending an actual GETATTR on >>> the wire >>> to retrieve the file size? >> >> I guess the answer might be because we can get it back from the last >> LAYOUTCOMMIT. >> > > The patch that I followed up with should now ensure that we do not > mark the attribute cache as up to date if there is a LAYOUTCOMMIT > pending. > IOW: when the pNFS write is done, it is expected to do 2 things: > > 1) mark the inode for LAYOUTCOMMIT > 2) mark the attribute cache as invalid (because we know the change > attribute, mtime, ctime need to be updates) > > In the case of blocks pNFS write: > The call to pnfs_set_layoutcommit() in pnfs_ld_write_done() should > take care of (1) > The call to nfs_writeback_update_inode() in nfs4_write_done_cb() > should take care of (2). > > Provided that these 2 calls are performed in the above order, then any > call to nfs_getattr() which has not been preceded by a call to > nfs4_proc_layoutcommit() should trigger the call to > __nfs_revalidate_inode(). OK, so maybe things are out of order here.. Thanks - this is helpful. >> This test has repeated appending 4k and has this pattern on the wire: >> >> NFS 334 V4 Call LAYOUTGET >> NFS 290 V4 Reply (Call In 854) LAYOUTCOMMIT >> NFS 294 V4 Call GETATTR FH: 0x4f5528b0 >> NFS 442 V4 Reply (Call In 858) GETATTR >> NFS 374 V4 Call LAYOUTCOMMIT >> NFS 314 V4 Reply (Call In 856) LAYOUTGET >> NFS 334 V4 Call LAYOUTGET >> NFS 290 V4 Reply (Call In 860) LAYOUTCOMMIT >> NFS 294 V4 Call GETATTR FH: 0x4f5528b0 >> NFS 442 V4 Reply (Call In 864) GETATTR >> NFS 374 V4 Call LAYOUTCOMMIT >> NFS 314 V4 Reply (Call In 862) LAYOUTGET >> NFS 334 V4 Call LAYOUTGET >> NFS 290 V4 Reply (Call In 866) LAYOUTCOMMIT >> NFS 294 V4 Call GETATTR FH: 0x4f5528b0 >> NFS 442 V4 Reply (Call In 870) GETATTR >> NFS 314 V4 Reply (Call In 868) LAYOUTGET >> NFS 374 V4 Call LAYOUTCOMMIT >> NFS 290 V4 Reply (Call In 874) LAYOUTCOMMIT >> NFS 314 V4 Call CLOSE StateID: 0x54d9 >> NFS 294 V4 Reply (Call In 876) CLOSE >> >> That last LAYOUTCOMMIT and the CLOSE have the size we want. The >> previous >> GETATTR is 4k short. > > When you say “4k short”, do you mean that it differs from the > value returned by the LAYOUTCOMMIT that immediately precedes it? It > looks as if there is a LAYOUTGET immediately following it, so > presumably the writes have not all completed yet. I meant it is 4k short of the final size returned in the LAYOUTCOMMIT immediately following. It is the same as the LAYOUTCOMMIT immediately preceding. FWIW at this point, here's a better look at the network: tshark -r /tmp/pcap -T fields -e frame.number -e nfs.fh.hash -e nfs.opcode -e nfs.fattr4.size 'frame.number > 860' 861 53,22,50 862 0x4f5528b0 53,22,50 863 53,22,49,9 409600 864 0x4f5528b0 53,22,9 865 53,22,9 409600 866 0x4f5528b0 53,22,49,9 867 53,22,50 868 0x4f5528b0 53,22,50 869 53,22,49,9 413696 870 0x4f5528b0 53,22,9 871 53,22,9 413696 872 53,22,50 873 874 0x4f5528b0 53,22,49,9 875 53,22,49,9 417792 876 0x4f5528b0 53,22,4,9 877 53,22,4,9 417792 ... 880 0x4f5528b0 53,22,9 881 53,22,9 417792 Now I see there's a GETATTR after the CLOSE that returns the right size -- but I'll really should track down what's happening to that one to see if it is the same call that the test is making. Unfortunately, I'm getting pulled away again, so I'll dig more later. Thanks for looking. Ben ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-21 13:20 ` Trond Myklebust 2016-07-21 14:00 ` Trond Myklebust 2016-07-21 14:02 ` Benjamin Coddington @ 2016-07-25 16:26 ` Benjamin Coddington 2016-07-25 16:39 ` Trond Myklebust 2 siblings, 1 reply; 69+ messages in thread From: Benjamin Coddington @ 2016-07-25 16:26 UTC (permalink / raw) To: Trond Myklebust; +Cc: hch, List Linux On 21 Jul 2016, at 9:20, Trond Myklebust wrote: >> On Jul 21, 2016, at 09:05, Benjamin Coddington <bcodding@redhat.com> >> wrote: >> >> So back to Christoph's point earlier: >> >> On 17 Jul 2016, at 23:48, Christoph Hellwig wrote: >>> This one breaks xfstests generic/207 on block/scsi layout for me. >>> The >>> reason for that is that we need a layoutcommit after writing out all >>> data for the file for the file size to be updated on the server. >> >> You responded: >> >> On 18 Jul 2016, at 0:32, Trond Myklebust wrote: >>> I’m not understanding this argument. Why do we care if the file >>> size is up >>> to date on the server if we’re not sending an actual GETATTR on >>> the wire >>> to retrieve the file size? >> >> I guess the answer might be because we can get it back from the last >> LAYOUTCOMMIT. >> > > The patch that I followed up with should now ensure that we do not > mark the attribute cache as up to date if there is a LAYOUTCOMMIT > pending. > IOW: when the pNFS write is done, it is expected to do 2 things: > > 1) mark the inode for LAYOUTCOMMIT > 2) mark the attribute cache as invalid (because we know the change > attribute, mtime, ctime need to be updates) > > In the case of blocks pNFS write: > The call to pnfs_set_layoutcommit() in pnfs_ld_write_done() should > take care of (1) > The call to nfs_writeback_update_inode() in nfs4_write_done_cb() > should take care of (2). > > Provided that these 2 calls are performed in the above order, then any > call to nfs_getattr() which has not been preceded by a call to > nfs4_proc_layoutcommit() should trigger the call to > __nfs_revalidate_inode(). I think the problem is that a following nfs_getattr() will fail to notice the size change in the case of a write_completion and layoutcommit occuring after nfs_getattr() has done pnfs_sync_inode() but before it has done nfs_update_inode(). In the failing case there are two threads one is doing writes, the other doing lstat on aio_complete via io_getevents(2). For each write completion the lstat thread tries to verify the file size. GETATTR Thread LAYOUTCOMMIT Thread -------------- -------------------- write_completion sets LAYOUTCOMMIT (4096@0) --> nfs_getattr __nfs_revalidate_inode pnfs_sync_inode getattr sees 4096 write_completion sets LAYOUTCOMMIT (4096@4096) sets LAYOUTCOMMITING clears LAYOUTCOMMIT clears LAYOUTCOMMITTING nfs_refresh_inode nfs_update_inode size is 4096 <-- nfs_getattr At this point the cached attributes are seen as up to date, but aio-dio-extend-stat program expects that second write_completion to reflect in the file size. Ben ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-25 16:26 ` Benjamin Coddington @ 2016-07-25 16:39 ` Trond Myklebust 2016-07-25 18:26 ` Benjamin Coddington 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-25 16:39 UTC (permalink / raw) To: Coddington Benjamin; +Cc: hch, List Linux NFS Mailing DQo+IE9uIEp1bCAyNSwgMjAxNiwgYXQgMTI6MjYsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2Rk aW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPiANCj4gT24gMjEgSnVsIDIwMTYsIGF0IDk6MjAsIFRy b25kIE15a2xlYnVzdCB3cm90ZToNCj4gDQo+Pj4gT24gSnVsIDIxLCAyMDE2LCBhdCAwOTowNSwg QmVuamFtaW4gQ29kZGluZ3RvbiA8YmNvZGRpbmdAcmVkaGF0LmNvbT4gd3JvdGU6DQo+Pj4gDQo+ Pj4gU28gYmFjayB0byBDaHJpc3RvcGgncyBwb2ludCBlYXJsaWVyOg0KPj4+IA0KPj4+IE9uIDE3 IEp1bCAyMDE2LCBhdCAyMzo0OCwgQ2hyaXN0b3BoIEhlbGx3aWcgd3JvdGU6DQo+Pj4+IFRoaXMg b25lIGJyZWFrcyB4ZnN0ZXN0cyBnZW5lcmljLzIwNyBvbiBibG9jay9zY3NpIGxheW91dCBmb3Ig bWUuICBUaGUNCj4+Pj4gcmVhc29uIGZvciB0aGF0IGlzIHRoYXQgd2UgbmVlZCBhIGxheW91dGNv bW1pdCBhZnRlciB3cml0aW5nIG91dCBhbGwNCj4+Pj4gZGF0YSBmb3IgdGhlIGZpbGUgZm9yIHRo ZSBmaWxlIHNpemUgdG8gYmUgdXBkYXRlZCBvbiB0aGUgc2VydmVyLg0KPj4+IA0KPj4+IFlvdSBy ZXNwb25kZWQ6DQo+Pj4gDQo+Pj4gT24gMTggSnVsIDIwMTYsIGF0IDA6MzIsIFRyb25kIE15a2xl YnVzdCB3cm90ZToNCj4+Pj4gSeKAmW0gbm90IHVuZGVyc3RhbmRpbmcgdGhpcyBhcmd1bWVudC4g V2h5IGRvIHdlIGNhcmUgaWYgdGhlIGZpbGUgc2l6ZSBpcyB1cA0KPj4+PiB0byBkYXRlIG9uIHRo ZSBzZXJ2ZXIgaWYgd2XigJlyZSBub3Qgc2VuZGluZyBhbiBhY3R1YWwgR0VUQVRUUiBvbiB0aGUg d2lyZQ0KPj4+PiB0byByZXRyaWV2ZSB0aGUgZmlsZSBzaXplPw0KPj4+IA0KPj4+IEkgZ3Vlc3Mg dGhlIGFuc3dlciBtaWdodCBiZSBiZWNhdXNlIHdlIGNhbiBnZXQgaXQgYmFjayBmcm9tIHRoZSBs YXN0DQo+Pj4gTEFZT1VUQ09NTUlULg0KPj4+IA0KPj4gDQo+PiBUaGUgcGF0Y2ggdGhhdCBJIGZv bGxvd2VkIHVwIHdpdGggc2hvdWxkIG5vdyBlbnN1cmUgdGhhdCB3ZSBkbyBub3QgbWFyayB0aGUg YXR0cmlidXRlIGNhY2hlIGFzIHVwIHRvIGRhdGUgaWYgdGhlcmUgaXMgYSBMQVlPVVRDT01NSVQg cGVuZGluZy4NCj4+IElPVzogd2hlbiB0aGUgcE5GUyB3cml0ZSBpcyBkb25lLCBpdCBpcyBleHBl Y3RlZCB0byBkbyAyIHRoaW5nczoNCj4+IA0KPj4gMSkgbWFyayB0aGUgaW5vZGUgZm9yIExBWU9V VENPTU1JVA0KPj4gMikgbWFyayB0aGUgYXR0cmlidXRlIGNhY2hlIGFzIGludmFsaWQgKGJlY2F1 c2Ugd2Uga25vdyB0aGUgY2hhbmdlIGF0dHJpYnV0ZSwgbXRpbWUsIGN0aW1lIG5lZWQgdG8gYmUg dXBkYXRlcykNCj4+IA0KPj4gSW4gdGhlIGNhc2Ugb2YgYmxvY2tzIHBORlMgd3JpdGU6DQo+PiBU aGUgY2FsbCB0byBwbmZzX3NldF9sYXlvdXRjb21taXQoKSBpbiBwbmZzX2xkX3dyaXRlX2RvbmUo KSBzaG91bGQgdGFrZSBjYXJlIG9mICgxKQ0KPj4gVGhlIGNhbGwgdG8gbmZzX3dyaXRlYmFja191 cGRhdGVfaW5vZGUoKSBpbiBuZnM0X3dyaXRlX2RvbmVfY2IoKSBzaG91bGQgdGFrZSBjYXJlIG9m ICgyKS4NCj4+IA0KPj4gUHJvdmlkZWQgdGhhdCB0aGVzZSAyIGNhbGxzIGFyZSBwZXJmb3JtZWQg aW4gdGhlIGFib3ZlIG9yZGVyLCB0aGVuIGFueSBjYWxsIHRvIG5mc19nZXRhdHRyKCkgd2hpY2gg aGFzIG5vdCBiZWVuIHByZWNlZGVkIGJ5IGEgY2FsbCB0byBuZnM0X3Byb2NfbGF5b3V0Y29tbWl0 KCkgc2hvdWxkIHRyaWdnZXIgdGhlIGNhbGwgdG8gX19uZnNfcmV2YWxpZGF0ZV9pbm9kZSgpLg0K PiANCj4gSSB0aGluayB0aGUgcHJvYmxlbSBpcyB0aGF0IGEgZm9sbG93aW5nIG5mc19nZXRhdHRy KCkgd2lsbCBmYWlsIHRvIG5vdGljZQ0KPiB0aGUgc2l6ZSBjaGFuZ2UgaW4gdGhlIGNhc2Ugb2Yg YSB3cml0ZV9jb21wbGV0aW9uIGFuZCBsYXlvdXRjb21taXQgb2NjdXJpbmcNCj4gYWZ0ZXIgbmZz X2dldGF0dHIoKSBoYXMgZG9uZSBwbmZzX3N5bmNfaW5vZGUoKSBidXQgYmVmb3JlIGl0IGhhcyBk b25lDQo+IG5mc191cGRhdGVfaW5vZGUoKS4NCj4gDQo+IEluIHRoZSBmYWlsaW5nIGNhc2UgdGhl cmUgYXJlIHR3byB0aHJlYWRzIG9uZSBpcyBkb2luZyB3cml0ZXMsIHRoZSBvdGhlcg0KPiBkb2lu ZyBsc3RhdCBvbiBhaW9fY29tcGxldGUgdmlhIGlvX2dldGV2ZW50cygyKS4NCj4gDQo+IEZvciBl YWNoIHdyaXRlIGNvbXBsZXRpb24gdGhlIGxzdGF0IHRocmVhZCB0cmllcyB0byB2ZXJpZnkgdGhl IGZpbGUgc2l6ZS4NCj4gDQo+IEdFVEFUVFIgVGhyZWFkICAgICAgICAgICAgICAgICAgTEFZT1VU Q09NTUlUIFRocmVhZA0KPiAtLS0tLS0tLS0tLS0tLSAgICAgICAgICAgICAgICAgIC0tLS0tLS0t LS0tLS0tLS0tLS0tDQo+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICB3cml0ZV9jb21w bGV0aW9uIHNldHMgTEFZT1VUQ09NTUlUICg0MDk2QDApDQo+IC0tPiBuZnNfZ2V0YXR0cg0KDQpm aWxlbWFwX3dyaXRlX2FuZF93YWl0KCkNCg0KPiBfX25mc19yZXZhbGlkYXRlX2lub2RlDQo+ICBw bmZzX3N5bmNfaW5vZGUNCg0KTkZTX1BST1RPKGlub2RlKS0+Z2V0YXR0cigpDQoNCj4gIGdldGF0 dHIgc2VlcyA0MDk2DQo+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICB3cml0ZV9jb21w bGV0aW9uIHNldHMgTEFZT1VUQ09NTUlUICg0MDk2QDQwOTYpDQo+ICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICBzZXRzIExBWU9VVENPTU1JVElORw0KPiAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgY2xlYXJzIExBWU9VVENPTU1JVA0KPiAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgY2xlYXJzIExBWU9VVENPTU1JVFRJTkcNCj4gIG5mc19yZWZyZXNoX2lub2RlDQo+ ICAgbmZzX3VwZGF0ZV9pbm9kZSBzaXplIGlzIDQwOTYNCj4gPC0tIG5mc19nZXRhdHRyDQo+IA0K PiBBdCB0aGlzIHBvaW50IHRoZSBjYWNoZWQgYXR0cmlidXRlcyBhcmUgc2VlbiBhcyB1cCB0byBk YXRlLCBidXQNCj4gYWlvLWRpby1leHRlbmQtc3RhdCBwcm9ncmFtIGV4cGVjdHMgdGhhdCBzZWNv bmQgd3JpdGVfY29tcGxldGlvbiB0byByZWZsZWN0DQo+IGluIHRoZSBmaWxlIHNpemUuDQo+IA0K DQpXaHkgaXNu4oCZdCB0aGUgZmlsZW1hcF93cml0ZV9hbmRfd2FpdCgpIGFib3ZlIHJlc29sdmlu ZyB0aGUgcmFjZT8gSeKAmWQgZXhwZWN0IHRoYXQgd291bGQgbW92ZSB5b3VyIOKAnHdyaXRlIGNv bXBsZXRpb24gc2V0cyBMQVlPVVRDT01NSVTigJ0gdXAgdG8gYmVmb3JlIHRoZSBwbmZzX3N5bmNf aW5vZGUoKS4NCkluIGZhY3QsIGluIHRoZSBwYXRjaCB0aGF0IENocmlzdG9waCBzZW50LCBhbGwg aGUgd2FzIGRvaW5nIHdhcyBtb3ZpbmcgdGhlIHBuZnNfc3luY19pbm9kZSgpIHRvIGltbWVkaWF0 ZWx5IGFmdGVyIHRoYXQgZmlsZW1hcF93cml0ZV9hbmRfd2FpdCgpIGluc3RlYWQgb2YgcmVseWlu ZyBvbiBpdCBpbiBfbmZzX3JldmFsaWRhdGVfaW5vZGUu ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-25 16:39 ` Trond Myklebust @ 2016-07-25 18:26 ` Benjamin Coddington 2016-07-25 18:34 ` Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Benjamin Coddington @ 2016-07-25 18:26 UTC (permalink / raw) To: Trond Myklebust; +Cc: hch, List Linux NFS Mailing On 25 Jul 2016, at 12:39, Trond Myklebust wrote: >> On Jul 25, 2016, at 12:26, Benjamin Coddington <bcodding@redhat.com> >> wrote: >> >> On 21 Jul 2016, at 9:20, Trond Myklebust wrote: >> >>>> On Jul 21, 2016, at 09:05, Benjamin Coddington >>>> <bcodding@redhat.com> wrote: >>>> >>>> So back to Christoph's point earlier: >>>> >>>> On 17 Jul 2016, at 23:48, Christoph Hellwig wrote: >>>>> This one breaks xfstests generic/207 on block/scsi layout for me. >>>>> The >>>>> reason for that is that we need a layoutcommit after writing out >>>>> all >>>>> data for the file for the file size to be updated on the server. >>>> >>>> You responded: >>>> >>>> On 18 Jul 2016, at 0:32, Trond Myklebust wrote: >>>>> I’m not understanding this argument. Why do we care if the file >>>>> size is up >>>>> to date on the server if we’re not sending an actual GETATTR on >>>>> the wire >>>>> to retrieve the file size? >>>> >>>> I guess the answer might be because we can get it back from the >>>> last >>>> LAYOUTCOMMIT. >>>> >>> >>> The patch that I followed up with should now ensure that we do not >>> mark the attribute cache as up to date if there is a LAYOUTCOMMIT >>> pending. >>> IOW: when the pNFS write is done, it is expected to do 2 things: >>> >>> 1) mark the inode for LAYOUTCOMMIT >>> 2) mark the attribute cache as invalid (because we know the change >>> attribute, mtime, ctime need to be updates) >>> >>> In the case of blocks pNFS write: >>> The call to pnfs_set_layoutcommit() in pnfs_ld_write_done() should >>> take care of (1) >>> The call to nfs_writeback_update_inode() in nfs4_write_done_cb() >>> should take care of (2). >>> >>> Provided that these 2 calls are performed in the above order, then >>> any call to nfs_getattr() which has not been preceded by a call to >>> nfs4_proc_layoutcommit() should trigger the call to >>> __nfs_revalidate_inode(). >> >> I think the problem is that a following nfs_getattr() will fail to >> notice >> the size change in the case of a write_completion and layoutcommit >> occuring >> after nfs_getattr() has done pnfs_sync_inode() but before it has done >> nfs_update_inode(). >> >> In the failing case there are two threads one is doing writes, the >> other >> doing lstat on aio_complete via io_getevents(2). >> >> For each write completion the lstat thread tries to verify the file >> size. >> >> GETATTR Thread LAYOUTCOMMIT Thread >> -------------- -------------------- >> write_completion sets LAYOUTCOMMIT >> (4096@0) >> --> nfs_getattr > > filemap_write_and_wait() > >> __nfs_revalidate_inode >> pnfs_sync_inode > > NFS_PROTO(inode)->getattr() > >> getattr sees 4096 >> write_completion sets LAYOUTCOMMIT >> (4096@4096) >> sets LAYOUTCOMMITING >> clears LAYOUTCOMMIT >> clears LAYOUTCOMMITTING >> nfs_refresh_inode >> nfs_update_inode size is 4096 >> <-- nfs_getattr >> >> At this point the cached attributes are seen as up to date, but >> aio-dio-extend-stat program expects that second write_completion to >> reflect >> in the file size. >> > > Why isn’t the filemap_write_and_wait() above resolving the race? > I’d > expect that would move your “write completion sets LAYOUTCOMMIT” > up to > before the pnfs_sync_inode(). In fact, in the patch that Christoph > sent, > all he was doing was moving the pnfs_sync_inode() to immediately after > that filemap_write_and_wait() instead of relying on it in > _nfs_revalidate_inode. This is O_DIRECT, I've failed to mention yet. The second write hasn't made it out of __nfs_pageio_add_request() at the time filemap_write_and_wait() is called. It is sleeping in pnfs_update_layout() waiting on a LAYOUTGET and it doesn't resumes until after filemap_write_and_wait(). ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-25 18:26 ` Benjamin Coddington @ 2016-07-25 18:34 ` Trond Myklebust 2016-07-25 18:41 ` Benjamin Coddington 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-25 18:34 UTC (permalink / raw) To: Coddington Benjamin; +Cc: hch, List Linux NFS Mailing DQo+IE9uIEp1bCAyNSwgMjAxNiwgYXQgMTQ6MjYsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2Rk aW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPiANCj4gDQo+IA0KPiBPbiAyNSBKdWwgMjAxNiwgYXQg MTI6MzksIFRyb25kIE15a2xlYnVzdCB3cm90ZToNCj4gDQo+Pj4gT24gSnVsIDI1LCAyMDE2LCBh dCAxMjoyNiwgQmVuamFtaW4gQ29kZGluZ3RvbiA8YmNvZGRpbmdAcmVkaGF0LmNvbT4gd3JvdGU6 DQo+Pj4gDQo+Pj4gT24gMjEgSnVsIDIwMTYsIGF0IDk6MjAsIFRyb25kIE15a2xlYnVzdCB3cm90 ZToNCj4+PiANCj4+Pj4+IE9uIEp1bCAyMSwgMjAxNiwgYXQgMDk6MDUsIEJlbmphbWluIENvZGRp bmd0b24gPGJjb2RkaW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPj4+Pj4gDQo+Pj4+PiBTbyBiYWNr IHRvIENocmlzdG9waCdzIHBvaW50IGVhcmxpZXI6DQo+Pj4+PiANCj4+Pj4+IE9uIDE3IEp1bCAy MDE2LCBhdCAyMzo0OCwgQ2hyaXN0b3BoIEhlbGx3aWcgd3JvdGU6DQo+Pj4+Pj4gVGhpcyBvbmUg YnJlYWtzIHhmc3Rlc3RzIGdlbmVyaWMvMjA3IG9uIGJsb2NrL3Njc2kgbGF5b3V0IGZvciBtZS4g IFRoZQ0KPj4+Pj4+IHJlYXNvbiBmb3IgdGhhdCBpcyB0aGF0IHdlIG5lZWQgYSBsYXlvdXRjb21t aXQgYWZ0ZXIgd3JpdGluZyBvdXQgYWxsDQo+Pj4+Pj4gZGF0YSBmb3IgdGhlIGZpbGUgZm9yIHRo ZSBmaWxlIHNpemUgdG8gYmUgdXBkYXRlZCBvbiB0aGUgc2VydmVyLg0KPj4+Pj4gDQo+Pj4+PiBZ b3UgcmVzcG9uZGVkOg0KPj4+Pj4gDQo+Pj4+PiBPbiAxOCBKdWwgMjAxNiwgYXQgMDozMiwgVHJv bmQgTXlrbGVidXN0IHdyb3RlOg0KPj4+Pj4+IEnigJltIG5vdCB1bmRlcnN0YW5kaW5nIHRoaXMg YXJndW1lbnQuIFdoeSBkbyB3ZSBjYXJlIGlmIHRoZSBmaWxlIHNpemUgaXMgdXANCj4+Pj4+PiB0 byBkYXRlIG9uIHRoZSBzZXJ2ZXIgaWYgd2XigJlyZSBub3Qgc2VuZGluZyBhbiBhY3R1YWwgR0VU QVRUUiBvbiB0aGUgd2lyZQ0KPj4+Pj4+IHRvIHJldHJpZXZlIHRoZSBmaWxlIHNpemU/DQo+Pj4+ PiANCj4+Pj4+IEkgZ3Vlc3MgdGhlIGFuc3dlciBtaWdodCBiZSBiZWNhdXNlIHdlIGNhbiBnZXQg aXQgYmFjayBmcm9tIHRoZSBsYXN0DQo+Pj4+PiBMQVlPVVRDT01NSVQuDQo+Pj4+PiANCj4+Pj4g DQo+Pj4+IFRoZSBwYXRjaCB0aGF0IEkgZm9sbG93ZWQgdXAgd2l0aCBzaG91bGQgbm93IGVuc3Vy ZSB0aGF0IHdlIGRvIG5vdCBtYXJrIHRoZSBhdHRyaWJ1dGUgY2FjaGUgYXMgdXAgdG8gZGF0ZSBp ZiB0aGVyZSBpcyBhIExBWU9VVENPTU1JVCBwZW5kaW5nLg0KPj4+PiBJT1c6IHdoZW4gdGhlIHBO RlMgd3JpdGUgaXMgZG9uZSwgaXQgaXMgZXhwZWN0ZWQgdG8gZG8gMiB0aGluZ3M6DQo+Pj4+IA0K Pj4+PiAxKSBtYXJrIHRoZSBpbm9kZSBmb3IgTEFZT1VUQ09NTUlUDQo+Pj4+IDIpIG1hcmsgdGhl IGF0dHJpYnV0ZSBjYWNoZSBhcyBpbnZhbGlkIChiZWNhdXNlIHdlIGtub3cgdGhlIGNoYW5nZSBh dHRyaWJ1dGUsIG10aW1lLCBjdGltZSBuZWVkIHRvIGJlIHVwZGF0ZXMpDQo+Pj4+IA0KPj4+PiBJ biB0aGUgY2FzZSBvZiBibG9ja3MgcE5GUyB3cml0ZToNCj4+Pj4gVGhlIGNhbGwgdG8gcG5mc19z ZXRfbGF5b3V0Y29tbWl0KCkgaW4gcG5mc19sZF93cml0ZV9kb25lKCkgc2hvdWxkIHRha2UgY2Fy ZSBvZiAoMSkNCj4+Pj4gVGhlIGNhbGwgdG8gbmZzX3dyaXRlYmFja191cGRhdGVfaW5vZGUoKSBp biBuZnM0X3dyaXRlX2RvbmVfY2IoKSBzaG91bGQgdGFrZSBjYXJlIG9mICgyKS4NCj4+Pj4gDQo+ Pj4+IFByb3ZpZGVkIHRoYXQgdGhlc2UgMiBjYWxscyBhcmUgcGVyZm9ybWVkIGluIHRoZSBhYm92 ZSBvcmRlciwgdGhlbiBhbnkgY2FsbCB0byBuZnNfZ2V0YXR0cigpIHdoaWNoIGhhcyBub3QgYmVl biBwcmVjZWRlZCBieSBhIGNhbGwgdG8gbmZzNF9wcm9jX2xheW91dGNvbW1pdCgpIHNob3VsZCB0 cmlnZ2VyIHRoZSBjYWxsIHRvIF9fbmZzX3JldmFsaWRhdGVfaW5vZGUoKS4NCj4+PiANCj4+PiBJ IHRoaW5rIHRoZSBwcm9ibGVtIGlzIHRoYXQgYSBmb2xsb3dpbmcgbmZzX2dldGF0dHIoKSB3aWxs IGZhaWwgdG8gbm90aWNlDQo+Pj4gdGhlIHNpemUgY2hhbmdlIGluIHRoZSBjYXNlIG9mIGEgd3Jp dGVfY29tcGxldGlvbiBhbmQgbGF5b3V0Y29tbWl0IG9jY3VyaW5nDQo+Pj4gYWZ0ZXIgbmZzX2dl dGF0dHIoKSBoYXMgZG9uZSBwbmZzX3N5bmNfaW5vZGUoKSBidXQgYmVmb3JlIGl0IGhhcyBkb25l DQo+Pj4gbmZzX3VwZGF0ZV9pbm9kZSgpLg0KPj4+IA0KPj4+IEluIHRoZSBmYWlsaW5nIGNhc2Ug dGhlcmUgYXJlIHR3byB0aHJlYWRzIG9uZSBpcyBkb2luZyB3cml0ZXMsIHRoZSBvdGhlcg0KPj4+ IGRvaW5nIGxzdGF0IG9uIGFpb19jb21wbGV0ZSB2aWEgaW9fZ2V0ZXZlbnRzKDIpLg0KPj4+IA0K Pj4+IEZvciBlYWNoIHdyaXRlIGNvbXBsZXRpb24gdGhlIGxzdGF0IHRocmVhZCB0cmllcyB0byB2 ZXJpZnkgdGhlIGZpbGUgc2l6ZS4NCj4+PiANCj4+PiBHRVRBVFRSIFRocmVhZCAgICAgICAgICAg ICAgICAgIExBWU9VVENPTU1JVCBUaHJlYWQNCj4+PiAtLS0tLS0tLS0tLS0tLSAgICAgICAgICAg ICAgICAgIC0tLS0tLS0tLS0tLS0tLS0tLS0tDQo+Pj4gICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgd3JpdGVfY29tcGxldGlvbiBzZXRzIExBWU9VVENPTU1JVCAoNDA5NkAwKQ0KPj4+IC0t PiBuZnNfZ2V0YXR0cg0KPj4gDQo+PiBmaWxlbWFwX3dyaXRlX2FuZF93YWl0KCkNCj4+IA0KPj4+ IF9fbmZzX3JldmFsaWRhdGVfaW5vZGUNCj4+PiBwbmZzX3N5bmNfaW5vZGUNCj4+IA0KPj4gTkZT X1BST1RPKGlub2RlKS0+Z2V0YXR0cigpDQo+PiANCj4+PiBnZXRhdHRyIHNlZXMgNDA5Ng0KPj4+ ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHdyaXRlX2NvbXBsZXRpb24gc2V0cyBMQVlP VVRDT01NSVQgKDQwOTZANDA5NikNCj4+PiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBz ZXRzIExBWU9VVENPTU1JVElORw0KPj4+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGNs ZWFycyBMQVlPVVRDT01NSVQNCj4+PiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBjbGVh cnMgTEFZT1VUQ09NTUlUVElORw0KPj4+IG5mc19yZWZyZXNoX2lub2RlDQo+Pj4gIG5mc191cGRh dGVfaW5vZGUgc2l6ZSBpcyA0MDk2DQo+Pj4gPC0tIG5mc19nZXRhdHRyDQo+Pj4gDQo+Pj4gQXQg dGhpcyBwb2ludCB0aGUgY2FjaGVkIGF0dHJpYnV0ZXMgYXJlIHNlZW4gYXMgdXAgdG8gZGF0ZSwg YnV0DQo+Pj4gYWlvLWRpby1leHRlbmQtc3RhdCBwcm9ncmFtIGV4cGVjdHMgdGhhdCBzZWNvbmQg d3JpdGVfY29tcGxldGlvbiB0byByZWZsZWN0DQo+Pj4gaW4gdGhlIGZpbGUgc2l6ZS4NCj4+PiAN Cj4+IA0KPj4gV2h5IGlzbuKAmXQgdGhlIGZpbGVtYXBfd3JpdGVfYW5kX3dhaXQoKSBhYm92ZSBy ZXNvbHZpbmcgdGhlIHJhY2U/IEnigJlkDQo+PiBleHBlY3QgdGhhdCB3b3VsZCBtb3ZlIHlvdXIg 4oCcd3JpdGUgY29tcGxldGlvbiBzZXRzIExBWU9VVENPTU1JVOKAnSB1cCB0bw0KPj4gYmVmb3Jl IHRoZSBwbmZzX3N5bmNfaW5vZGUoKS4gIEluIGZhY3QsIGluIHRoZSBwYXRjaCB0aGF0IENocmlz dG9waCBzZW50LA0KPj4gYWxsIGhlIHdhcyBkb2luZyB3YXMgbW92aW5nIHRoZSBwbmZzX3N5bmNf aW5vZGUoKSB0byBpbW1lZGlhdGVseSBhZnRlcg0KPj4gdGhhdCBmaWxlbWFwX3dyaXRlX2FuZF93 YWl0KCkgaW5zdGVhZCBvZiByZWx5aW5nIG9uIGl0IGluDQo+PiBfbmZzX3JldmFsaWRhdGVfaW5v ZGUuDQo+IA0KPiBUaGlzIGlzIE9fRElSRUNULCBJJ3ZlIGZhaWxlZCB0byBtZW50aW9uIHlldC4g IFRoZSBzZWNvbmQgd3JpdGUgaGFzbid0IG1hZGUNCj4gaXQgb3V0IG9mIF9fbmZzX3BhZ2Vpb19h ZGRfcmVxdWVzdCgpIGF0IHRoZSB0aW1lIGZpbGVtYXBfd3JpdGVfYW5kX3dhaXQoKSBpcw0KPiBj YWxsZWQuICBJdCBpcyBzbGVlcGluZyBpbiBwbmZzX3VwZGF0ZV9sYXlvdXQoKSB3YWl0aW5nIG9u IGEgTEFZT1VUR0VUIGFuZCBpdA0KPiBkb2Vzbid0IHJlc3VtZXMgdW50aWwgYWZ0ZXIgZmlsZW1h cF93cml0ZV9hbmRfd2FpdCgpLg0KDQpXYWl0LCBzbyB5b3UgaGF2ZSAxIHRocmVhZCBkb2luZyBh biBPX0RJUkVDVCB3cml0ZSgpIGFuZCBhbm90aGVyIGRvaW5nIGEgc3RhdCgpIGluIHBhcmFsbGVs PyBXaHkgd291bGQgdGhlcmUgYmUgYW4gZXhwZWN0YXRpb24gdGhhdCB0aGUgZmlsZXN5c3RlbSBz aG91bGQgc2VyaWFsaXNlIHRob3NlIHN5c3RlbSBjYWxscz8NCg0K ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-25 18:34 ` Trond Myklebust @ 2016-07-25 18:41 ` Benjamin Coddington 2016-07-26 16:32 ` Benjamin Coddington 0 siblings, 1 reply; 69+ messages in thread From: Benjamin Coddington @ 2016-07-25 18:41 UTC (permalink / raw) To: Trond Myklebust; +Cc: hch, List Linux NFS Mailing On 25 Jul 2016, at 14:34, Trond Myklebust wrote: >> On Jul 25, 2016, at 14:26, Benjamin Coddington <bcodding@redhat.com> >> wrote: >> >> >> >> On 25 Jul 2016, at 12:39, Trond Myklebust wrote: >> >>>> On Jul 25, 2016, at 12:26, Benjamin Coddington >>>> <bcodding@redhat.com> wrote: >>>> >>>> On 21 Jul 2016, at 9:20, Trond Myklebust wrote: >>>> >>>>>> On Jul 21, 2016, at 09:05, Benjamin Coddington >>>>>> <bcodding@redhat.com> wrote: >>>>>> >>>>>> So back to Christoph's point earlier: >>>>>> >>>>>> On 17 Jul 2016, at 23:48, Christoph Hellwig wrote: >>>>>>> This one breaks xfstests generic/207 on block/scsi layout for >>>>>>> me. The >>>>>>> reason for that is that we need a layoutcommit after writing out >>>>>>> all >>>>>>> data for the file for the file size to be updated on the server. >>>>>> >>>>>> You responded: >>>>>> >>>>>> On 18 Jul 2016, at 0:32, Trond Myklebust wrote: >>>>>>> I’m not understanding this argument. Why do we care if the >>>>>>> file size is up >>>>>>> to date on the server if we’re not sending an actual GETATTR >>>>>>> on the wire >>>>>>> to retrieve the file size? >>>>>> >>>>>> I guess the answer might be because we can get it back from the >>>>>> last >>>>>> LAYOUTCOMMIT. >>>>>> >>>>> >>>>> The patch that I followed up with should now ensure that we do not >>>>> mark the attribute cache as up to date if there is a LAYOUTCOMMIT >>>>> pending. >>>>> IOW: when the pNFS write is done, it is expected to do 2 things: >>>>> >>>>> 1) mark the inode for LAYOUTCOMMIT >>>>> 2) mark the attribute cache as invalid (because we know the change >>>>> attribute, mtime, ctime need to be updates) >>>>> >>>>> In the case of blocks pNFS write: >>>>> The call to pnfs_set_layoutcommit() in pnfs_ld_write_done() should >>>>> take care of (1) >>>>> The call to nfs_writeback_update_inode() in nfs4_write_done_cb() >>>>> should take care of (2). >>>>> >>>>> Provided that these 2 calls are performed in the above order, then >>>>> any call to nfs_getattr() which has not been preceded by a call to >>>>> nfs4_proc_layoutcommit() should trigger the call to >>>>> __nfs_revalidate_inode(). >>>> >>>> I think the problem is that a following nfs_getattr() will fail to >>>> notice >>>> the size change in the case of a write_completion and layoutcommit >>>> occuring >>>> after nfs_getattr() has done pnfs_sync_inode() but before it has >>>> done >>>> nfs_update_inode(). >>>> >>>> In the failing case there are two threads one is doing writes, the >>>> other >>>> doing lstat on aio_complete via io_getevents(2). >>>> >>>> For each write completion the lstat thread tries to verify the file >>>> size. >>>> >>>> GETATTR Thread LAYOUTCOMMIT Thread >>>> -------------- -------------------- >>>> write_completion sets LAYOUTCOMMIT >>>> (4096@0) >>>> --> nfs_getattr >>> >>> filemap_write_and_wait() >>> >>>> __nfs_revalidate_inode >>>> pnfs_sync_inode >>> >>> NFS_PROTO(inode)->getattr() >>> >>>> getattr sees 4096 >>>> write_completion sets LAYOUTCOMMIT >>>> (4096@4096) >>>> sets LAYOUTCOMMITING >>>> clears LAYOUTCOMMIT >>>> clears LAYOUTCOMMITTING >>>> nfs_refresh_inode >>>> nfs_update_inode size is 4096 >>>> <-- nfs_getattr >>>> >>>> At this point the cached attributes are seen as up to date, but >>>> aio-dio-extend-stat program expects that second write_completion to >>>> reflect >>>> in the file size. >>>> >>> >>> Why isn’t the filemap_write_and_wait() above resolving the race? >>> I’d >>> expect that would move your “write completion sets LAYOUTCOMMIT” >>> up to >>> before the pnfs_sync_inode(). In fact, in the patch that Christoph >>> sent, >>> all he was doing was moving the pnfs_sync_inode() to immediately >>> after >>> that filemap_write_and_wait() instead of relying on it in >>> _nfs_revalidate_inode. >> >> This is O_DIRECT, I've failed to mention yet. The second write >> hasn't made >> it out of __nfs_pageio_add_request() at the time >> filemap_write_and_wait() is >> called. It is sleeping in pnfs_update_layout() waiting on a >> LAYOUTGET and it >> doesn't resumes until after filemap_write_and_wait(). > > Wait, so you have 1 thread doing an O_DIRECT write() and another doing > a > stat() in parallel? Why would there be an expectation that the > filesystem > should serialise those system calls? Not exactly parallel, but synchronized on aio_complete. A comment in generic/207's src/aio-dio-regress/aio-dio-extend-stat.c: 36 /* 37 * This was originally submitted to 38 * http://bugzilla.kernel.org/show_bug.cgi?id=6831 by 39 * Rafal Wijata <wijata@nec-labs.com>. It caught a race in dio aio completion 40 * that would call aio_complete() before the dio callers would update i_size. 41 * A stat after io_getevents() would not see the new file size. 42 * 43 * The bug was fixed in the fs/direct-io.c completion reworking that appeared 44 * in 2.6.20. This test should fail on 2.6.19. 45 */ As far as I can see, this check is the whole point of generic/207.. ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-25 18:41 ` Benjamin Coddington @ 2016-07-26 16:32 ` Benjamin Coddington 2016-07-26 16:35 ` Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Benjamin Coddington @ 2016-07-26 16:32 UTC (permalink / raw) To: Trond Myklebust; +Cc: hch, List Linux NFS Mailing On 25 Jul 2016, at 14:41, Benjamin Coddington wrote: > On 25 Jul 2016, at 14:34, Trond Myklebust wrote: > >>> On Jul 25, 2016, at 14:26, Benjamin Coddington <bcodding@redhat.com> >>> wrote: >>> >>> >>> >>> On 25 Jul 2016, at 12:39, Trond Myklebust wrote: >>> >>>>> On Jul 25, 2016, at 12:26, Benjamin Coddington >>>>> <bcodding@redhat.com> wrote: >>>>> >>>>> On 21 Jul 2016, at 9:20, Trond Myklebust wrote: >>>>> >>>>>>> On Jul 21, 2016, at 09:05, Benjamin Coddington >>>>>>> <bcodding@redhat.com> wrote: >>>>>>> >>>>>>> So back to Christoph's point earlier: >>>>>>> >>>>>>> On 17 Jul 2016, at 23:48, Christoph Hellwig wrote: >>>>>>>> This one breaks xfstests generic/207 on block/scsi layout for >>>>>>>> me. The >>>>>>>> reason for that is that we need a layoutcommit after writing >>>>>>>> out all >>>>>>>> data for the file for the file size to be updated on the >>>>>>>> server. >>>>>>> >>>>>>> You responded: >>>>>>> >>>>>>> On 18 Jul 2016, at 0:32, Trond Myklebust wrote: >>>>>>>> I’m not understanding this argument. Why do we care if the >>>>>>>> file size is up >>>>>>>> to date on the server if we’re not sending an actual GETATTR >>>>>>>> on the wire >>>>>>>> to retrieve the file size? >>>>>>> >>>>>>> I guess the answer might be because we can get it back from the >>>>>>> last >>>>>>> LAYOUTCOMMIT. >>>>>>> >>>>>> >>>>>> The patch that I followed up with should now ensure that we do >>>>>> not mark the attribute cache as up to date if there is a >>>>>> LAYOUTCOMMIT pending. >>>>>> IOW: when the pNFS write is done, it is expected to do 2 things: >>>>>> >>>>>> 1) mark the inode for LAYOUTCOMMIT >>>>>> 2) mark the attribute cache as invalid (because we know the >>>>>> change attribute, mtime, ctime need to be updates) >>>>>> >>>>>> In the case of blocks pNFS write: >>>>>> The call to pnfs_set_layoutcommit() in pnfs_ld_write_done() >>>>>> should take care of (1) >>>>>> The call to nfs_writeback_update_inode() in nfs4_write_done_cb() >>>>>> should take care of (2). >>>>>> >>>>>> Provided that these 2 calls are performed in the above order, >>>>>> then any call to nfs_getattr() which has not been preceded by a >>>>>> call to nfs4_proc_layoutcommit() should trigger the call to >>>>>> __nfs_revalidate_inode(). >>>>> >>>>> I think the problem is that a following nfs_getattr() will fail to >>>>> notice >>>>> the size change in the case of a write_completion and layoutcommit >>>>> occuring >>>>> after nfs_getattr() has done pnfs_sync_inode() but before it has >>>>> done >>>>> nfs_update_inode(). >>>>> >>>>> In the failing case there are two threads one is doing writes, the >>>>> other >>>>> doing lstat on aio_complete via io_getevents(2). >>>>> >>>>> For each write completion the lstat thread tries to verify the >>>>> file size. >>>>> >>>>> GETATTR Thread LAYOUTCOMMIT Thread >>>>> -------------- -------------------- >>>>> write_completion sets LAYOUTCOMMIT >>>>> (4096@0) >>>>> --> nfs_getattr >>>> >>>> filemap_write_and_wait() >>>> >>>>> __nfs_revalidate_inode >>>>> pnfs_sync_inode >>>> >>>> NFS_PROTO(inode)->getattr() >>>> >>>>> getattr sees 4096 >>>>> write_completion sets LAYOUTCOMMIT >>>>> (4096@4096) >>>>> sets LAYOUTCOMMITING >>>>> clears LAYOUTCOMMIT >>>>> clears LAYOUTCOMMITTING >>>>> nfs_refresh_inode >>>>> nfs_update_inode size is 4096 >>>>> <-- nfs_getattr >>>>> >>>>> At this point the cached attributes are seen as up to date, but >>>>> aio-dio-extend-stat program expects that second write_completion >>>>> to reflect >>>>> in the file size. >>>>> >>>> >>>> Why isn’t the filemap_write_and_wait() above resolving the race? >>>> I’d >>>> expect that would move your “write completion sets >>>> LAYOUTCOMMIT” up to >>>> before the pnfs_sync_inode(). In fact, in the patch that Christoph >>>> sent, >>>> all he was doing was moving the pnfs_sync_inode() to immediately >>>> after >>>> that filemap_write_and_wait() instead of relying on it in >>>> _nfs_revalidate_inode. >>> >>> This is O_DIRECT, I've failed to mention yet. The second write >>> hasn't made >>> it out of __nfs_pageio_add_request() at the time >>> filemap_write_and_wait() is >>> called. It is sleeping in pnfs_update_layout() waiting on a >>> LAYOUTGET and it >>> doesn't resumes until after filemap_write_and_wait(). >> >> Wait, so you have 1 thread doing an O_DIRECT write() and another >> doing a >> stat() in parallel? Why would there be an expectation that the >> filesystem >> should serialise those system calls? > > Not exactly parallel, but synchronized on aio_complete. A comment in > generic/207's src/aio-dio-regress/aio-dio-extend-stat.c: > > 36 /* > 37 * This was originally submitted to > 38 * http://bugzilla.kernel.org/show_bug.cgi?id=6831 by > 39 * Rafal Wijata <wijata@nec-labs.com>. It caught a race in dio > aio completion > 40 * that would call aio_complete() before the dio callers would > update i_size. > 41 * A stat after io_getevents() would not see the new file size. > 42 * > 43 * The bug was fixed in the fs/direct-io.c completion reworking > that appeared > 44 * in 2.6.20. This test should fail on 2.6.19. > 45 */ > > As far as I can see, this check is the whole point of generic/207.. This would fix it up: diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index f108d58101f8..823700f827b6 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -661,6 +661,7 @@ int nfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) trace_nfs_getattr_enter(inode); /* Flush out writes to the server in order to update c/mtime. */ + nfs_start_io_read(inode); if (S_ISREG(inode->i_mode)) { err = filemap_write_and_wait(inode->i_mapping); if (err) @@ -694,6 +695,7 @@ int nfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) stat->blksize = NFS_SERVER(inode)->dtsize; } out: + nfs_end_io_read(inode); trace_nfs_getattr_exit(inode, err); return err; } Trond, what do you think? I'll take any additional silence as a sign to go elsewhere. :P Ben ^ permalink raw reply related [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-26 16:32 ` Benjamin Coddington @ 2016-07-26 16:35 ` Trond Myklebust 2016-07-26 17:57 ` Benjamin Coddington 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-26 16:35 UTC (permalink / raw) To: Coddington Benjamin; +Cc: hch, List Linux NFS Mailing DQo+IE9uIEp1bCAyNiwgMjAxNiwgYXQgMTI6MzIsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2Rk aW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPiANCj4gT24gMjUgSnVsIDIwMTYsIGF0IDE0OjQxLCBC ZW5qYW1pbiBDb2RkaW5ndG9uIHdyb3RlOg0KPiANCj4+IE9uIDI1IEp1bCAyMDE2LCBhdCAxNDoz NCwgVHJvbmQgTXlrbGVidXN0IHdyb3RlOg0KPj4gDQo+Pj4+IE9uIEp1bCAyNSwgMjAxNiwgYXQg MTQ6MjYsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2RkaW5nQHJlZGhhdC5jb20+IHdyb3RlOg0K Pj4+PiANCj4+Pj4gDQo+Pj4+IA0KPj4+PiBPbiAyNSBKdWwgMjAxNiwgYXQgMTI6MzksIFRyb25k IE15a2xlYnVzdCB3cm90ZToNCj4+Pj4gDQo+Pj4+Pj4gT24gSnVsIDI1LCAyMDE2LCBhdCAxMjoy NiwgQmVuamFtaW4gQ29kZGluZ3RvbiA8YmNvZGRpbmdAcmVkaGF0LmNvbT4gd3JvdGU6DQo+Pj4+ Pj4gDQo+Pj4+Pj4gT24gMjEgSnVsIDIwMTYsIGF0IDk6MjAsIFRyb25kIE15a2xlYnVzdCB3cm90 ZToNCj4+Pj4+PiANCj4+Pj4+Pj4+IE9uIEp1bCAyMSwgMjAxNiwgYXQgMDk6MDUsIEJlbmphbWlu IENvZGRpbmd0b24gPGJjb2RkaW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPj4+Pj4+Pj4gDQo+Pj4+ Pj4+PiBTbyBiYWNrIHRvIENocmlzdG9waCdzIHBvaW50IGVhcmxpZXI6DQo+Pj4+Pj4+PiANCj4+ Pj4+Pj4+IE9uIDE3IEp1bCAyMDE2LCBhdCAyMzo0OCwgQ2hyaXN0b3BoIEhlbGx3aWcgd3JvdGU6 DQo+Pj4+Pj4+Pj4gVGhpcyBvbmUgYnJlYWtzIHhmc3Rlc3RzIGdlbmVyaWMvMjA3IG9uIGJsb2Nr L3Njc2kgbGF5b3V0IGZvciBtZS4gIFRoZQ0KPj4+Pj4+Pj4+IHJlYXNvbiBmb3IgdGhhdCBpcyB0 aGF0IHdlIG5lZWQgYSBsYXlvdXRjb21taXQgYWZ0ZXIgd3JpdGluZyBvdXQgYWxsDQo+Pj4+Pj4+ Pj4gZGF0YSBmb3IgdGhlIGZpbGUgZm9yIHRoZSBmaWxlIHNpemUgdG8gYmUgdXBkYXRlZCBvbiB0 aGUgc2VydmVyLg0KPj4+Pj4+Pj4gDQo+Pj4+Pj4+PiBZb3UgcmVzcG9uZGVkOg0KPj4+Pj4+Pj4g DQo+Pj4+Pj4+PiBPbiAxOCBKdWwgMjAxNiwgYXQgMDozMiwgVHJvbmQgTXlrbGVidXN0IHdyb3Rl Og0KPj4+Pj4+Pj4+IEnigJltIG5vdCB1bmRlcnN0YW5kaW5nIHRoaXMgYXJndW1lbnQuIFdoeSBk byB3ZSBjYXJlIGlmIHRoZSBmaWxlIHNpemUgaXMgdXANCj4+Pj4+Pj4+PiB0byBkYXRlIG9uIHRo ZSBzZXJ2ZXIgaWYgd2XigJlyZSBub3Qgc2VuZGluZyBhbiBhY3R1YWwgR0VUQVRUUiBvbiB0aGUg d2lyZQ0KPj4+Pj4+Pj4+IHRvIHJldHJpZXZlIHRoZSBmaWxlIHNpemU/DQo+Pj4+Pj4+PiANCj4+ Pj4+Pj4+IEkgZ3Vlc3MgdGhlIGFuc3dlciBtaWdodCBiZSBiZWNhdXNlIHdlIGNhbiBnZXQgaXQg YmFjayBmcm9tIHRoZSBsYXN0DQo+Pj4+Pj4+PiBMQVlPVVRDT01NSVQuDQo+Pj4+Pj4+PiANCj4+ Pj4+Pj4gDQo+Pj4+Pj4+IFRoZSBwYXRjaCB0aGF0IEkgZm9sbG93ZWQgdXAgd2l0aCBzaG91bGQg bm93IGVuc3VyZSB0aGF0IHdlIGRvIG5vdCBtYXJrIHRoZSBhdHRyaWJ1dGUgY2FjaGUgYXMgdXAg dG8gZGF0ZSBpZiB0aGVyZSBpcyBhIExBWU9VVENPTU1JVCBwZW5kaW5nLg0KPj4+Pj4+PiBJT1c6 IHdoZW4gdGhlIHBORlMgd3JpdGUgaXMgZG9uZSwgaXQgaXMgZXhwZWN0ZWQgdG8gZG8gMiB0aGlu Z3M6DQo+Pj4+Pj4+IA0KPj4+Pj4+PiAxKSBtYXJrIHRoZSBpbm9kZSBmb3IgTEFZT1VUQ09NTUlU DQo+Pj4+Pj4+IDIpIG1hcmsgdGhlIGF0dHJpYnV0ZSBjYWNoZSBhcyBpbnZhbGlkIChiZWNhdXNl IHdlIGtub3cgdGhlIGNoYW5nZSBhdHRyaWJ1dGUsIG10aW1lLCBjdGltZSBuZWVkIHRvIGJlIHVw ZGF0ZXMpDQo+Pj4+Pj4+IA0KPj4+Pj4+PiBJbiB0aGUgY2FzZSBvZiBibG9ja3MgcE5GUyB3cml0 ZToNCj4+Pj4+Pj4gVGhlIGNhbGwgdG8gcG5mc19zZXRfbGF5b3V0Y29tbWl0KCkgaW4gcG5mc19s ZF93cml0ZV9kb25lKCkgc2hvdWxkIHRha2UgY2FyZSBvZiAoMSkNCj4+Pj4+Pj4gVGhlIGNhbGwg dG8gbmZzX3dyaXRlYmFja191cGRhdGVfaW5vZGUoKSBpbiBuZnM0X3dyaXRlX2RvbmVfY2IoKSBz aG91bGQgdGFrZSBjYXJlIG9mICgyKS4NCj4+Pj4+Pj4gDQo+Pj4+Pj4+IFByb3ZpZGVkIHRoYXQg dGhlc2UgMiBjYWxscyBhcmUgcGVyZm9ybWVkIGluIHRoZSBhYm92ZSBvcmRlciwgdGhlbiBhbnkg Y2FsbCB0byBuZnNfZ2V0YXR0cigpIHdoaWNoIGhhcyBub3QgYmVlbiBwcmVjZWRlZCBieSBhIGNh bGwgdG8gbmZzNF9wcm9jX2xheW91dGNvbW1pdCgpIHNob3VsZCB0cmlnZ2VyIHRoZSBjYWxsIHRv IF9fbmZzX3JldmFsaWRhdGVfaW5vZGUoKS4NCj4+Pj4+PiANCj4+Pj4+PiBJIHRoaW5rIHRoZSBw cm9ibGVtIGlzIHRoYXQgYSBmb2xsb3dpbmcgbmZzX2dldGF0dHIoKSB3aWxsIGZhaWwgdG8gbm90 aWNlDQo+Pj4+Pj4gdGhlIHNpemUgY2hhbmdlIGluIHRoZSBjYXNlIG9mIGEgd3JpdGVfY29tcGxl dGlvbiBhbmQgbGF5b3V0Y29tbWl0IG9jY3VyaW5nDQo+Pj4+Pj4gYWZ0ZXIgbmZzX2dldGF0dHIo KSBoYXMgZG9uZSBwbmZzX3N5bmNfaW5vZGUoKSBidXQgYmVmb3JlIGl0IGhhcyBkb25lDQo+Pj4+ Pj4gbmZzX3VwZGF0ZV9pbm9kZSgpLg0KPj4+Pj4+IA0KPj4+Pj4+IEluIHRoZSBmYWlsaW5nIGNh c2UgdGhlcmUgYXJlIHR3byB0aHJlYWRzIG9uZSBpcyBkb2luZyB3cml0ZXMsIHRoZSBvdGhlcg0K Pj4+Pj4+IGRvaW5nIGxzdGF0IG9uIGFpb19jb21wbGV0ZSB2aWEgaW9fZ2V0ZXZlbnRzKDIpLg0K Pj4+Pj4+IA0KPj4+Pj4+IEZvciBlYWNoIHdyaXRlIGNvbXBsZXRpb24gdGhlIGxzdGF0IHRocmVh ZCB0cmllcyB0byB2ZXJpZnkgdGhlIGZpbGUgc2l6ZS4NCj4+Pj4+PiANCj4+Pj4+PiBHRVRBVFRS IFRocmVhZCAgICAgICAgICAgICAgICAgIExBWU9VVENPTU1JVCBUaHJlYWQNCj4+Pj4+PiAtLS0t LS0tLS0tLS0tLSAgICAgICAgICAgICAgICAgIC0tLS0tLS0tLS0tLS0tLS0tLS0tDQo+Pj4+Pj4g ICAgICAgICAgICAgICAgICAgICAgICAgICAgICB3cml0ZV9jb21wbGV0aW9uIHNldHMgTEFZT1VU Q09NTUlUICg0MDk2QDApDQo+Pj4+Pj4gLS0+IG5mc19nZXRhdHRyDQo+Pj4+PiANCj4+Pj4+IGZp bGVtYXBfd3JpdGVfYW5kX3dhaXQoKQ0KPj4+Pj4gDQo+Pj4+Pj4gX19uZnNfcmV2YWxpZGF0ZV9p bm9kZQ0KPj4+Pj4+IHBuZnNfc3luY19pbm9kZQ0KPj4+Pj4gDQo+Pj4+PiBORlNfUFJPVE8oaW5v ZGUpLT5nZXRhdHRyKCkNCj4+Pj4+IA0KPj4+Pj4+IGdldGF0dHIgc2VlcyA0MDk2DQo+Pj4+Pj4g ICAgICAgICAgICAgICAgICAgICAgICAgICAgICB3cml0ZV9jb21wbGV0aW9uIHNldHMgTEFZT1VU Q09NTUlUICg0MDk2QDQwOTYpDQo+Pj4+Pj4gICAgICAgICAgICAgICAgICAgICAgICAgICAgICBz ZXRzIExBWU9VVENPTU1JVElORw0KPj4+Pj4+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg Y2xlYXJzIExBWU9VVENPTU1JVA0KPj4+Pj4+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg Y2xlYXJzIExBWU9VVENPTU1JVFRJTkcNCj4+Pj4+PiBuZnNfcmVmcmVzaF9pbm9kZQ0KPj4+Pj4+ IG5mc191cGRhdGVfaW5vZGUgc2l6ZSBpcyA0MDk2DQo+Pj4+Pj4gPC0tIG5mc19nZXRhdHRyDQo+ Pj4+Pj4gDQo+Pj4+Pj4gQXQgdGhpcyBwb2ludCB0aGUgY2FjaGVkIGF0dHJpYnV0ZXMgYXJlIHNl ZW4gYXMgdXAgdG8gZGF0ZSwgYnV0DQo+Pj4+Pj4gYWlvLWRpby1leHRlbmQtc3RhdCBwcm9ncmFt IGV4cGVjdHMgdGhhdCBzZWNvbmQgd3JpdGVfY29tcGxldGlvbiB0byByZWZsZWN0DQo+Pj4+Pj4g aW4gdGhlIGZpbGUgc2l6ZS4NCj4+Pj4+PiANCj4+Pj4+IA0KPj4+Pj4gV2h5IGlzbuKAmXQgdGhl IGZpbGVtYXBfd3JpdGVfYW5kX3dhaXQoKSBhYm92ZSByZXNvbHZpbmcgdGhlIHJhY2U/IEnigJlk DQo+Pj4+PiBleHBlY3QgdGhhdCB3b3VsZCBtb3ZlIHlvdXIg4oCcd3JpdGUgY29tcGxldGlvbiBz ZXRzIExBWU9VVENPTU1JVOKAnSB1cCB0bw0KPj4+Pj4gYmVmb3JlIHRoZSBwbmZzX3N5bmNfaW5v ZGUoKS4gIEluIGZhY3QsIGluIHRoZSBwYXRjaCB0aGF0IENocmlzdG9waCBzZW50LA0KPj4+Pj4g YWxsIGhlIHdhcyBkb2luZyB3YXMgbW92aW5nIHRoZSBwbmZzX3N5bmNfaW5vZGUoKSB0byBpbW1l ZGlhdGVseSBhZnRlcg0KPj4+Pj4gdGhhdCBmaWxlbWFwX3dyaXRlX2FuZF93YWl0KCkgaW5zdGVh ZCBvZiByZWx5aW5nIG9uIGl0IGluDQo+Pj4+PiBfbmZzX3JldmFsaWRhdGVfaW5vZGUuDQo+Pj4+ IA0KPj4+PiBUaGlzIGlzIE9fRElSRUNULCBJJ3ZlIGZhaWxlZCB0byBtZW50aW9uIHlldC4gIFRo ZSBzZWNvbmQgd3JpdGUgaGFzbid0IG1hZGUNCj4+Pj4gaXQgb3V0IG9mIF9fbmZzX3BhZ2Vpb19h ZGRfcmVxdWVzdCgpIGF0IHRoZSB0aW1lIGZpbGVtYXBfd3JpdGVfYW5kX3dhaXQoKSBpcw0KPj4+ PiBjYWxsZWQuICBJdCBpcyBzbGVlcGluZyBpbiBwbmZzX3VwZGF0ZV9sYXlvdXQoKSB3YWl0aW5n IG9uIGEgTEFZT1VUR0VUIGFuZCBpdA0KPj4+PiBkb2Vzbid0IHJlc3VtZXMgdW50aWwgYWZ0ZXIg ZmlsZW1hcF93cml0ZV9hbmRfd2FpdCgpLg0KPj4+IA0KPj4+IFdhaXQsIHNvIHlvdSBoYXZlIDEg dGhyZWFkIGRvaW5nIGFuIE9fRElSRUNUIHdyaXRlKCkgYW5kIGFub3RoZXIgZG9pbmcgYQ0KPj4+ IHN0YXQoKSBpbiBwYXJhbGxlbD8gV2h5IHdvdWxkIHRoZXJlIGJlIGFuIGV4cGVjdGF0aW9uIHRo YXQgdGhlIGZpbGVzeXN0ZW0NCj4+PiBzaG91bGQgc2VyaWFsaXNlIHRob3NlIHN5c3RlbSBjYWxs cz8NCj4+IA0KPj4gTm90IGV4YWN0bHkgcGFyYWxsZWwsIGJ1dCBzeW5jaHJvbml6ZWQgb24gYWlv X2NvbXBsZXRlLiAgQSBjb21tZW50IGluDQo+PiBnZW5lcmljLzIwNydzIHNyYy9haW8tZGlvLXJl Z3Jlc3MvYWlvLWRpby1leHRlbmQtc3RhdC5jOg0KPj4gDQo+PiAzNiAvKg0KPj4gMzcgICogVGhp cyB3YXMgb3JpZ2luYWxseSBzdWJtaXR0ZWQgdG8NCj4+IDM4ICAqIGh0dHA6Ly9idWd6aWxsYS5r ZXJuZWwub3JnL3Nob3dfYnVnLmNnaT9pZD02ODMxIGJ5DQo+PiAzOSAgKiBSYWZhbCBXaWphdGEg PHdpamF0YUBuZWMtbGFicy5jb20+LiAgSXQgY2F1Z2h0IGEgcmFjZSBpbiBkaW8gYWlvIGNvbXBs ZXRpb24NCj4+IDQwICAqIHRoYXQgd291bGQgY2FsbCBhaW9fY29tcGxldGUoKSBiZWZvcmUgdGhl IGRpbyBjYWxsZXJzIHdvdWxkIHVwZGF0ZSBpX3NpemUuDQo+PiA0MSAgKiBBIHN0YXQgYWZ0ZXIg aW9fZ2V0ZXZlbnRzKCkgd291bGQgbm90IHNlZSB0aGUgbmV3IGZpbGUgc2l6ZS4NCj4+IDQyICAq DQo+PiA0MyAgKiBUaGUgYnVnIHdhcyBmaXhlZCBpbiB0aGUgZnMvZGlyZWN0LWlvLmMgY29tcGxl dGlvbiByZXdvcmtpbmcgdGhhdCBhcHBlYXJlZA0KPj4gNDQgICogaW4gMi42LjIwLiAgVGhpcyB0 ZXN0IHNob3VsZCBmYWlsIG9uIDIuNi4xOS4NCj4+IDQ1ICAqLw0KPj4gDQo+PiBBcyBmYXIgYXMg SSBjYW4gc2VlLCB0aGlzIGNoZWNrIGlzIHRoZSB3aG9sZSBwb2ludCBvZiBnZW5lcmljLzIwNy4u DQo+IA0KPiBUaGlzIHdvdWxkIGZpeCBpdCB1cDoNCj4gDQo+IGRpZmYgLS1naXQgYS9mcy9uZnMv aW5vZGUuYyBiL2ZzL25mcy9pbm9kZS5jDQo+IGluZGV4IGYxMDhkNTgxMDFmOC4uODIzNzAwZjgy N2I2IDEwMDY0NA0KPiAtLS0gYS9mcy9uZnMvaW5vZGUuYw0KPiArKysgYi9mcy9uZnMvaW5vZGUu Yw0KPiBAQCAtNjYxLDYgKzY2MSw3IEBAIGludCBuZnNfZ2V0YXR0cihzdHJ1Y3QgdmZzbW91bnQg Km1udCwgc3RydWN0IGRlbnRyeQ0KPiAqZGVudHJ5LCBzdHJ1Y3Qga3N0YXQgKnN0YXQpDQo+IA0K PiAgICAgICAgdHJhY2VfbmZzX2dldGF0dHJfZW50ZXIoaW5vZGUpOw0KPiAgICAgICAgLyogRmx1 c2ggb3V0IHdyaXRlcyB0byB0aGUgc2VydmVyIGluIG9yZGVyIHRvIHVwZGF0ZSBjL210aW1lLiAg Ki8NCj4gKyAgICAgICBuZnNfc3RhcnRfaW9fcmVhZChpbm9kZSk7DQo+ICAgICAgICBpZiAoU19J U1JFRyhpbm9kZS0+aV9tb2RlKSkgew0KPiAgICAgICAgICAgICAgICBlcnIgPSBmaWxlbWFwX3dy aXRlX2FuZF93YWl0KGlub2RlLT5pX21hcHBpbmcpOw0KPiAgICAgICAgICAgICAgICBpZiAoZXJy KQ0KPiBAQCAtNjk0LDYgKzY5NSw3IEBAIGludCBuZnNfZ2V0YXR0cihzdHJ1Y3QgdmZzbW91bnQg Km1udCwgc3RydWN0IGRlbnRyeQ0KPiAqZGVudHJ5LCBzdHJ1Y3Qga3N0YXQgKnN0YXQpDQo+ICAg ICAgICAgICAgICAgICAgICAgICAgc3RhdC0+Ymxrc2l6ZSA9IE5GU19TRVJWRVIoaW5vZGUpLT5k dHNpemU7DQo+ICAgICAgICB9DQo+IG91dDoNCj4gKyAgICAgICBuZnNfZW5kX2lvX3JlYWQoaW5v ZGUpOw0KPiAgICAgICAgdHJhY2VfbmZzX2dldGF0dHJfZXhpdChpbm9kZSwgZXJyKTsNCj4gICAg ICAgIHJldHVybiBlcnI7DQo+IH0NCj4gDQo+IFRyb25kLCB3aGF0IGRvIHlvdSB0aGluaz8gIEkn bGwgdGFrZSBhbnkgYWRkaXRpb25hbCBzaWxlbmNlIGFzIGEgc2lnbiB0byBnbw0KPiBlbHNld2hl cmUuICA6UA0KDQpOby4gVGhlIGFib3ZlIGxvY2tpbmcgZXhjbHVkZXMgYWxsIHdyaXRlcyBhcyB3 ZWxsIGFzIE9fRElSRUNUIHJlYWRz4oCmIFRoYXTigJlzIHdvcnNlIHRoYW4gd2UgaGFkIGJlZm9y ZS4NCg0KSeKAmWQgbGlrZSByYXRoZXIgdG8gdW5kZXJzdGFuZCBfd2h5XyB0aGUgYWlvX2NvbXBs ZXRlKCkgaXMgZmFpbGluZyB0byB3b3JrIGNvcnJlY3RseSBoZXJlLiBBY2NvcmRpbmcgdG8gdGhl IGFuYWx5c2lzIG9mIHRoZSB0ZXN0IGNhc2UgdGhhdCB5b3UgcXVvdGVkIHllc3RlcmRheSwgdGhl IE9fRElSRUNUIHdyaXRlcyBzaG91bGQgaGF2ZSBjb21wbGV0ZWQgYmVmb3JlIHdlIGV2ZW4gY2Fs bCBzdGF0KCkuDQoNCkNoZWVycw0KICBUcm9uZA0KDQo= ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-26 16:35 ` Trond Myklebust @ 2016-07-26 17:57 ` Benjamin Coddington 2016-07-26 18:07 ` Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Benjamin Coddington @ 2016-07-26 17:57 UTC (permalink / raw) To: Trond Myklebust; +Cc: hch, List Linux NFS Mailing On 26 Jul 2016, at 12:35, Trond Myklebust wrote: >> On Jul 26, 2016, at 12:32, Benjamin Coddington <bcodding@redhat.com> >> wrote: >> >> On 25 Jul 2016, at 14:41, Benjamin Coddington wrote: >> >>> On 25 Jul 2016, at 14:34, Trond Myklebust wrote: >>> >>>>> On Jul 25, 2016, at 14:26, Benjamin Coddington >>>>> <bcodding@redhat.com> wrote: >>>>> >>>>> >>>>> >>>>> On 25 Jul 2016, at 12:39, Trond Myklebust wrote: >>>>> >>>>>>> On Jul 25, 2016, at 12:26, Benjamin Coddington >>>>>>> <bcodding@redhat.com> wrote: >>>>>>> >>>>>>> On 21 Jul 2016, at 9:20, Trond Myklebust wrote: >>>>>>> >>>>>>>>> On Jul 21, 2016, at 09:05, Benjamin Coddington >>>>>>>>> <bcodding@redhat.com> wrote: >>>>>>>>> >>>>>>>>> So back to Christoph's point earlier: >>>>>>>>> >>>>>>>>> On 17 Jul 2016, at 23:48, Christoph Hellwig wrote: >>>>>>>>>> This one breaks xfstests generic/207 on block/scsi layout for >>>>>>>>>> me. The >>>>>>>>>> reason for that is that we need a layoutcommit after writing >>>>>>>>>> out all >>>>>>>>>> data for the file for the file size to be updated on the >>>>>>>>>> server. >>>>>>>>> >>>>>>>>> You responded: >>>>>>>>> >>>>>>>>> On 18 Jul 2016, at 0:32, Trond Myklebust wrote: >>>>>>>>>> I’m not understanding this argument. Why do we care if the >>>>>>>>>> file size is up >>>>>>>>>> to date on the server if we’re not sending an actual >>>>>>>>>> GETATTR on the wire >>>>>>>>>> to retrieve the file size? >>>>>>>>> >>>>>>>>> I guess the answer might be because we can get it back from >>>>>>>>> the last >>>>>>>>> LAYOUTCOMMIT. >>>>>>>>> >>>>>>>> >>>>>>>> The patch that I followed up with should now ensure that we do >>>>>>>> not mark the attribute cache as up to date if there is a >>>>>>>> LAYOUTCOMMIT pending. >>>>>>>> IOW: when the pNFS write is done, it is expected to do 2 >>>>>>>> things: >>>>>>>> >>>>>>>> 1) mark the inode for LAYOUTCOMMIT >>>>>>>> 2) mark the attribute cache as invalid (because we know the >>>>>>>> change attribute, mtime, ctime need to be updates) >>>>>>>> >>>>>>>> In the case of blocks pNFS write: >>>>>>>> The call to pnfs_set_layoutcommit() in pnfs_ld_write_done() >>>>>>>> should take care of (1) >>>>>>>> The call to nfs_writeback_update_inode() in >>>>>>>> nfs4_write_done_cb() should take care of (2). >>>>>>>> >>>>>>>> Provided that these 2 calls are performed in the above order, >>>>>>>> then any call to nfs_getattr() which has not been preceded by a >>>>>>>> call to nfs4_proc_layoutcommit() should trigger the call to >>>>>>>> __nfs_revalidate_inode(). >>>>>>> >>>>>>> I think the problem is that a following nfs_getattr() will fail >>>>>>> to notice >>>>>>> the size change in the case of a write_completion and >>>>>>> layoutcommit occuring >>>>>>> after nfs_getattr() has done pnfs_sync_inode() but before it has >>>>>>> done >>>>>>> nfs_update_inode(). >>>>>>> >>>>>>> In the failing case there are two threads one is doing writes, >>>>>>> the other >>>>>>> doing lstat on aio_complete via io_getevents(2). >>>>>>> >>>>>>> For each write completion the lstat thread tries to verify the >>>>>>> file size. >>>>>>> >>>>>>> GETATTR Thread LAYOUTCOMMIT Thread >>>>>>> -------------- -------------------- >>>>>>> write_completion sets LAYOUTCOMMIT >>>>>>> (4096@0) >>>>>>> --> nfs_getattr >>>>>> >>>>>> filemap_write_and_wait() >>>>>> >>>>>>> __nfs_revalidate_inode >>>>>>> pnfs_sync_inode >>>>>> >>>>>> NFS_PROTO(inode)->getattr() >>>>>> >>>>>>> getattr sees 4096 >>>>>>> write_completion sets LAYOUTCOMMIT >>>>>>> (4096@4096) >>>>>>> sets LAYOUTCOMMITING >>>>>>> clears LAYOUTCOMMIT >>>>>>> clears LAYOUTCOMMITTING >>>>>>> nfs_refresh_inode >>>>>>> nfs_update_inode size is 4096 >>>>>>> <-- nfs_getattr >>>>>>> >>>>>>> At this point the cached attributes are seen as up to date, but >>>>>>> aio-dio-extend-stat program expects that second write_completion >>>>>>> to reflect >>>>>>> in the file size. >>>>>>> >>>>>> >>>>>> Why isn’t the filemap_write_and_wait() above resolving the >>>>>> race? I’d >>>>>> expect that would move your “write completion sets >>>>>> LAYOUTCOMMIT” up to >>>>>> before the pnfs_sync_inode(). In fact, in the patch that >>>>>> Christoph sent, >>>>>> all he was doing was moving the pnfs_sync_inode() to immediately >>>>>> after >>>>>> that filemap_write_and_wait() instead of relying on it in >>>>>> _nfs_revalidate_inode. >>>>> >>>>> This is O_DIRECT, I've failed to mention yet. The second write >>>>> hasn't made >>>>> it out of __nfs_pageio_add_request() at the time >>>>> filemap_write_and_wait() is >>>>> called. It is sleeping in pnfs_update_layout() waiting on a >>>>> LAYOUTGET and it >>>>> doesn't resumes until after filemap_write_and_wait(). >>>> >>>> Wait, so you have 1 thread doing an O_DIRECT write() and another >>>> doing a >>>> stat() in parallel? Why would there be an expectation that the >>>> filesystem >>>> should serialise those system calls? >>> >>> Not exactly parallel, but synchronized on aio_complete. A comment >>> in >>> generic/207's src/aio-dio-regress/aio-dio-extend-stat.c: >>> >>> 36 /* >>> 37 * This was originally submitted to >>> 38 * http://bugzilla.kernel.org/show_bug.cgi?id=6831 by >>> 39 * Rafal Wijata <wijata@nec-labs.com>. It caught a race in dio >>> aio completion >>> 40 * that would call aio_complete() before the dio callers would >>> update i_size. >>> 41 * A stat after io_getevents() would not see the new file size. >>> 42 * >>> 43 * The bug was fixed in the fs/direct-io.c completion reworking >>> that appeared >>> 44 * in 2.6.20. This test should fail on 2.6.19. >>> 45 */ >>> >>> As far as I can see, this check is the whole point of generic/207.. >> >> This would fix it up: >> >> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c >> index f108d58101f8..823700f827b6 100644 >> --- a/fs/nfs/inode.c >> +++ b/fs/nfs/inode.c >> @@ -661,6 +661,7 @@ int nfs_getattr(struct vfsmount *mnt, struct >> dentry >> *dentry, struct kstat *stat) >> >> trace_nfs_getattr_enter(inode); >> /* Flush out writes to the server in order to update c/mtime. >> */ >> + nfs_start_io_read(inode); >> if (S_ISREG(inode->i_mode)) { >> err = filemap_write_and_wait(inode->i_mapping); >> if (err) >> @@ -694,6 +695,7 @@ int nfs_getattr(struct vfsmount *mnt, struct >> dentry >> *dentry, struct kstat *stat) >> stat->blksize = NFS_SERVER(inode)->dtsize; >> } >> out: >> + nfs_end_io_read(inode); >> trace_nfs_getattr_exit(inode, err); >> return err; >> } >> >> Trond, what do you think? I'll take any additional silence as a sign >> to go >> elsewhere. :P > > No. The above locking excludes all writes as well as O_DIRECT reads… > That’s worse than we had before. > > I’d like rather to understand _why_ the aio_complete() is failing to > work correctly here. According to the analysis of the test case that > you quoted yesterday, the O_DIRECT writes should have completed before > we even call stat(). > > Cheers > Trond The O_DIRECT writes do complete, and every completion signals the other thread to do stat(), but that completion does not update the size on the server. As we know, we need a LAYOUTCOMMIT. After this patch, we're only going to do a LAYOUTCOMMIT if nfs_need_revalidate_inode(inode). So what happens is that the first write completes, and the first nfs_getattr() triggers the first LAYOUTCOMMIT, and then a GETATTR. Simultaneously, the second write is waiting on a LAYOUTGET. The GETATTR completes and sets the size to 4k. Now the attributes are marked up to date with a size of 4k, and when the second write completes, nfs_getattr() is called, nfs_need_revalidate_inode() is false, and we don't bother to send another LAYOUTCOMMIT or GETATTR to correct the cached file size. Here's a function graph of that: http://people.redhat.com/bcodding/borken We could invalidate the attribute cache every time a write completes.. maybe nfs_writeback_update_inode() isn't doing the job for block layouts (are we not setting res.count? I'll look at that..) I think we could also use the LAYOUTCOMMIT results to invalidate the cache, RFC-5661 18.42.3: If the metadata server updates the file's size as the result of the LAYOUTCOMMIT operation, it must return the new size (locr_newsize.ns_size) as part of the results." I'm not sure you want to bother to try to reproduce -- but if so, you don't need special hardware for SCSI layout. I wrote up a quick how-to for SCSI layouts on a VM in qemu: http://people.redhat.com/bcodding/pnfs/nfs/scsi/2016/07/13/pnfs_scsi_setup_for_VMs/ Ben ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-26 17:57 ` Benjamin Coddington @ 2016-07-26 18:07 ` Trond Myklebust 2016-07-27 11:55 ` Benjamin Coddington 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-26 18:07 UTC (permalink / raw) To: Coddington Benjamin; +Cc: hch, List Linux NFS Mailing DQo+IE9uIEp1bCAyNiwgMjAxNiwgYXQgMTM6NTcsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2Rk aW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPiANCj4gT24gMjYgSnVsIDIwMTYsIGF0IDEyOjM1LCBU cm9uZCBNeWtsZWJ1c3Qgd3JvdGU6DQo+IA0KPj4+IE9uIEp1bCAyNiwgMjAxNiwgYXQgMTI6MzIs IEJlbmphbWluIENvZGRpbmd0b24gPGJjb2RkaW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPj4+IA0K Pj4+IE9uIDI1IEp1bCAyMDE2LCBhdCAxNDo0MSwgQmVuamFtaW4gQ29kZGluZ3RvbiB3cm90ZToN Cj4+PiANCj4+Pj4gT24gMjUgSnVsIDIwMTYsIGF0IDE0OjM0LCBUcm9uZCBNeWtsZWJ1c3Qgd3Jv dGU6DQo+Pj4+IA0KPj4+Pj4+IE9uIEp1bCAyNSwgMjAxNiwgYXQgMTQ6MjYsIEJlbmphbWluIENv ZGRpbmd0b24gPGJjb2RkaW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPj4+Pj4+IA0KPj4+Pj4+IA0K Pj4+Pj4+IA0KPj4+Pj4+IE9uIDI1IEp1bCAyMDE2LCBhdCAxMjozOSwgVHJvbmQgTXlrbGVidXN0 IHdyb3RlOg0KPj4+Pj4+IA0KPj4+Pj4+Pj4gT24gSnVsIDI1LCAyMDE2LCBhdCAxMjoyNiwgQmVu amFtaW4gQ29kZGluZ3RvbiA8YmNvZGRpbmdAcmVkaGF0LmNvbT4gd3JvdGU6DQo+Pj4+Pj4+PiAN Cj4+Pj4+Pj4+IE9uIDIxIEp1bCAyMDE2LCBhdCA5OjIwLCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6 DQo+Pj4+Pj4+PiANCj4+Pj4+Pj4+Pj4gT24gSnVsIDIxLCAyMDE2LCBhdCAwOTowNSwgQmVuamFt aW4gQ29kZGluZ3RvbiA8YmNvZGRpbmdAcmVkaGF0LmNvbT4gd3JvdGU6DQo+Pj4+Pj4+Pj4+IA0K Pj4+Pj4+Pj4+PiBTbyBiYWNrIHRvIENocmlzdG9waCdzIHBvaW50IGVhcmxpZXI6DQo+Pj4+Pj4+ Pj4+IA0KPj4+Pj4+Pj4+PiBPbiAxNyBKdWwgMjAxNiwgYXQgMjM6NDgsIENocmlzdG9waCBIZWxs d2lnIHdyb3RlOg0KPj4+Pj4+Pj4+Pj4gVGhpcyBvbmUgYnJlYWtzIHhmc3Rlc3RzIGdlbmVyaWMv MjA3IG9uIGJsb2NrL3Njc2kgbGF5b3V0IGZvciBtZS4gIFRoZQ0KPj4+Pj4+Pj4+Pj4gcmVhc29u IGZvciB0aGF0IGlzIHRoYXQgd2UgbmVlZCBhIGxheW91dGNvbW1pdCBhZnRlciB3cml0aW5nIG91 dCBhbGwNCj4+Pj4+Pj4+Pj4+IGRhdGEgZm9yIHRoZSBmaWxlIGZvciB0aGUgZmlsZSBzaXplIHRv IGJlIHVwZGF0ZWQgb24gdGhlIHNlcnZlci4NCj4+Pj4+Pj4+Pj4gDQo+Pj4+Pj4+Pj4+IFlvdSBy ZXNwb25kZWQ6DQo+Pj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4+PiBPbiAxOCBKdWwgMjAxNiwgYXQgMDoz MiwgVHJvbmQgTXlrbGVidXN0IHdyb3RlOg0KPj4+Pj4+Pj4+Pj4gSeKAmW0gbm90IHVuZGVyc3Rh bmRpbmcgdGhpcyBhcmd1bWVudC4gV2h5IGRvIHdlIGNhcmUgaWYgdGhlIGZpbGUgc2l6ZSBpcyB1 cA0KPj4+Pj4+Pj4+Pj4gdG8gZGF0ZSBvbiB0aGUgc2VydmVyIGlmIHdl4oCZcmUgbm90IHNlbmRp bmcgYW4gYWN0dWFsIEdFVEFUVFIgb24gdGhlIHdpcmUNCj4+Pj4+Pj4+Pj4+IHRvIHJldHJpZXZl IHRoZSBmaWxlIHNpemU/DQo+Pj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4+PiBJIGd1ZXNzIHRoZSBhbnN3 ZXIgbWlnaHQgYmUgYmVjYXVzZSB3ZSBjYW4gZ2V0IGl0IGJhY2sgZnJvbSB0aGUgbGFzdA0KPj4+ Pj4+Pj4+PiBMQVlPVVRDT01NSVQuDQo+Pj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4+ IFRoZSBwYXRjaCB0aGF0IEkgZm9sbG93ZWQgdXAgd2l0aCBzaG91bGQgbm93IGVuc3VyZSB0aGF0 IHdlIGRvIG5vdCBtYXJrIHRoZSBhdHRyaWJ1dGUgY2FjaGUgYXMgdXAgdG8gZGF0ZSBpZiB0aGVy ZSBpcyBhIExBWU9VVENPTU1JVCBwZW5kaW5nLg0KPj4+Pj4+Pj4+IElPVzogd2hlbiB0aGUgcE5G UyB3cml0ZSBpcyBkb25lLCBpdCBpcyBleHBlY3RlZCB0byBkbyAyIHRoaW5nczoNCj4+Pj4+Pj4+ PiANCj4+Pj4+Pj4+PiAxKSBtYXJrIHRoZSBpbm9kZSBmb3IgTEFZT1VUQ09NTUlUDQo+Pj4+Pj4+ Pj4gMikgbWFyayB0aGUgYXR0cmlidXRlIGNhY2hlIGFzIGludmFsaWQgKGJlY2F1c2Ugd2Uga25v dyB0aGUgY2hhbmdlIGF0dHJpYnV0ZSwgbXRpbWUsIGN0aW1lIG5lZWQgdG8gYmUgdXBkYXRlcykN Cj4+Pj4+Pj4+PiANCj4+Pj4+Pj4+PiBJbiB0aGUgY2FzZSBvZiBibG9ja3MgcE5GUyB3cml0ZToN Cj4+Pj4+Pj4+PiBUaGUgY2FsbCB0byBwbmZzX3NldF9sYXlvdXRjb21taXQoKSBpbiBwbmZzX2xk X3dyaXRlX2RvbmUoKSBzaG91bGQgdGFrZSBjYXJlIG9mICgxKQ0KPj4+Pj4+Pj4+IFRoZSBjYWxs IHRvIG5mc193cml0ZWJhY2tfdXBkYXRlX2lub2RlKCkgaW4gbmZzNF93cml0ZV9kb25lX2NiKCkg c2hvdWxkIHRha2UgY2FyZSBvZiAoMikuDQo+Pj4+Pj4+Pj4gDQo+Pj4+Pj4+Pj4gUHJvdmlkZWQg dGhhdCB0aGVzZSAyIGNhbGxzIGFyZSBwZXJmb3JtZWQgaW4gdGhlIGFib3ZlIG9yZGVyLCB0aGVu IGFueSBjYWxsIHRvIG5mc19nZXRhdHRyKCkgd2hpY2ggaGFzIG5vdCBiZWVuIHByZWNlZGVkIGJ5 IGEgY2FsbCB0byBuZnM0X3Byb2NfbGF5b3V0Y29tbWl0KCkgc2hvdWxkIHRyaWdnZXIgdGhlIGNh bGwgdG8gX19uZnNfcmV2YWxpZGF0ZV9pbm9kZSgpLg0KPj4+Pj4+Pj4gDQo+Pj4+Pj4+PiBJIHRo aW5rIHRoZSBwcm9ibGVtIGlzIHRoYXQgYSBmb2xsb3dpbmcgbmZzX2dldGF0dHIoKSB3aWxsIGZh aWwgdG8gbm90aWNlDQo+Pj4+Pj4+PiB0aGUgc2l6ZSBjaGFuZ2UgaW4gdGhlIGNhc2Ugb2YgYSB3 cml0ZV9jb21wbGV0aW9uIGFuZCBsYXlvdXRjb21taXQgb2NjdXJpbmcNCj4+Pj4+Pj4+IGFmdGVy IG5mc19nZXRhdHRyKCkgaGFzIGRvbmUgcG5mc19zeW5jX2lub2RlKCkgYnV0IGJlZm9yZSBpdCBo YXMgZG9uZQ0KPj4+Pj4+Pj4gbmZzX3VwZGF0ZV9pbm9kZSgpLg0KPj4+Pj4+Pj4gDQo+Pj4+Pj4+ PiBJbiB0aGUgZmFpbGluZyBjYXNlIHRoZXJlIGFyZSB0d28gdGhyZWFkcyBvbmUgaXMgZG9pbmcg d3JpdGVzLCB0aGUgb3RoZXINCj4+Pj4+Pj4+IGRvaW5nIGxzdGF0IG9uIGFpb19jb21wbGV0ZSB2 aWEgaW9fZ2V0ZXZlbnRzKDIpLg0KPj4+Pj4+Pj4gDQo+Pj4+Pj4+PiBGb3IgZWFjaCB3cml0ZSBj b21wbGV0aW9uIHRoZSBsc3RhdCB0aHJlYWQgdHJpZXMgdG8gdmVyaWZ5IHRoZSBmaWxlIHNpemUu DQo+Pj4+Pj4+PiANCj4+Pj4+Pj4+IEdFVEFUVFIgVGhyZWFkICAgICAgICAgICAgICAgICAgTEFZ T1VUQ09NTUlUIFRocmVhZA0KPj4+Pj4+Pj4gLS0tLS0tLS0tLS0tLS0gICAgICAgICAgICAgICAg ICAtLS0tLS0tLS0tLS0tLS0tLS0tLQ0KPj4+Pj4+Pj4gICAgICAgICAgICAgICAgICAgICAgICAg ICAgIHdyaXRlX2NvbXBsZXRpb24gc2V0cyBMQVlPVVRDT01NSVQgKDQwOTZAMCkNCj4+Pj4+Pj4+ IC0tPiBuZnNfZ2V0YXR0cg0KPj4+Pj4+PiANCj4+Pj4+Pj4gZmlsZW1hcF93cml0ZV9hbmRfd2Fp dCgpDQo+Pj4+Pj4+IA0KPj4+Pj4+Pj4gX19uZnNfcmV2YWxpZGF0ZV9pbm9kZQ0KPj4+Pj4+Pj4g cG5mc19zeW5jX2lub2RlDQo+Pj4+Pj4+IA0KPj4+Pj4+PiBORlNfUFJPVE8oaW5vZGUpLT5nZXRh dHRyKCkNCj4+Pj4+Pj4gDQo+Pj4+Pj4+PiBnZXRhdHRyIHNlZXMgNDA5Ng0KPj4+Pj4+Pj4gICAg ICAgICAgICAgICAgICAgICAgICAgICAgIHdyaXRlX2NvbXBsZXRpb24gc2V0cyBMQVlPVVRDT01N SVQgKDQwOTZANDA5NikNCj4+Pj4+Pj4+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICBzZXRz IExBWU9VVENPTU1JVElORw0KPj4+Pj4+Pj4gICAgICAgICAgICAgICAgICAgICAgICAgICAgIGNs ZWFycyBMQVlPVVRDT01NSVQNCj4+Pj4+Pj4+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICBj bGVhcnMgTEFZT1VUQ09NTUlUVElORw0KPj4+Pj4+Pj4gbmZzX3JlZnJlc2hfaW5vZGUNCj4+Pj4+ Pj4+IG5mc191cGRhdGVfaW5vZGUgc2l6ZSBpcyA0MDk2DQo+Pj4+Pj4+PiA8LS0gbmZzX2dldGF0 dHINCj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4gQXQgdGhpcyBwb2ludCB0aGUgY2FjaGVkIGF0dHJpYnV0 ZXMgYXJlIHNlZW4gYXMgdXAgdG8gZGF0ZSwgYnV0DQo+Pj4+Pj4+PiBhaW8tZGlvLWV4dGVuZC1z dGF0IHByb2dyYW0gZXhwZWN0cyB0aGF0IHNlY29uZCB3cml0ZV9jb21wbGV0aW9uIHRvIHJlZmxl Y3QNCj4+Pj4+Pj4+IGluIHRoZSBmaWxlIHNpemUuDQo+Pj4+Pj4+PiANCj4+Pj4+Pj4gDQo+Pj4+ Pj4+IFdoeSBpc27igJl0IHRoZSBmaWxlbWFwX3dyaXRlX2FuZF93YWl0KCkgYWJvdmUgcmVzb2x2 aW5nIHRoZSByYWNlPyBJ4oCZZA0KPj4+Pj4+PiBleHBlY3QgdGhhdCB3b3VsZCBtb3ZlIHlvdXIg 4oCcd3JpdGUgY29tcGxldGlvbiBzZXRzIExBWU9VVENPTU1JVOKAnSB1cCB0bw0KPj4+Pj4+PiBi ZWZvcmUgdGhlIHBuZnNfc3luY19pbm9kZSgpLiAgSW4gZmFjdCwgaW4gdGhlIHBhdGNoIHRoYXQg Q2hyaXN0b3BoIHNlbnQsDQo+Pj4+Pj4+IGFsbCBoZSB3YXMgZG9pbmcgd2FzIG1vdmluZyB0aGUg cG5mc19zeW5jX2lub2RlKCkgdG8gaW1tZWRpYXRlbHkgYWZ0ZXINCj4+Pj4+Pj4gdGhhdCBmaWxl bWFwX3dyaXRlX2FuZF93YWl0KCkgaW5zdGVhZCBvZiByZWx5aW5nIG9uIGl0IGluDQo+Pj4+Pj4+ IF9uZnNfcmV2YWxpZGF0ZV9pbm9kZS4NCj4+Pj4+PiANCj4+Pj4+PiBUaGlzIGlzIE9fRElSRUNU LCBJJ3ZlIGZhaWxlZCB0byBtZW50aW9uIHlldC4gIFRoZSBzZWNvbmQgd3JpdGUgaGFzbid0IG1h ZGUNCj4+Pj4+PiBpdCBvdXQgb2YgX19uZnNfcGFnZWlvX2FkZF9yZXF1ZXN0KCkgYXQgdGhlIHRp bWUgZmlsZW1hcF93cml0ZV9hbmRfd2FpdCgpIGlzDQo+Pj4+Pj4gY2FsbGVkLiAgSXQgaXMgc2xl ZXBpbmcgaW4gcG5mc191cGRhdGVfbGF5b3V0KCkgd2FpdGluZyBvbiBhIExBWU9VVEdFVCBhbmQg aXQNCj4+Pj4+PiBkb2Vzbid0IHJlc3VtZXMgdW50aWwgYWZ0ZXIgZmlsZW1hcF93cml0ZV9hbmRf d2FpdCgpLg0KPj4+Pj4gDQo+Pj4+PiBXYWl0LCBzbyB5b3UgaGF2ZSAxIHRocmVhZCBkb2luZyBh biBPX0RJUkVDVCB3cml0ZSgpIGFuZCBhbm90aGVyIGRvaW5nIGENCj4+Pj4+IHN0YXQoKSBpbiBw YXJhbGxlbD8gV2h5IHdvdWxkIHRoZXJlIGJlIGFuIGV4cGVjdGF0aW9uIHRoYXQgdGhlIGZpbGVz eXN0ZW0NCj4+Pj4+IHNob3VsZCBzZXJpYWxpc2UgdGhvc2Ugc3lzdGVtIGNhbGxzPw0KPj4+PiAN Cj4+Pj4gTm90IGV4YWN0bHkgcGFyYWxsZWwsIGJ1dCBzeW5jaHJvbml6ZWQgb24gYWlvX2NvbXBs ZXRlLiAgQSBjb21tZW50IGluDQo+Pj4+IGdlbmVyaWMvMjA3J3Mgc3JjL2Fpby1kaW8tcmVncmVz cy9haW8tZGlvLWV4dGVuZC1zdGF0LmM6DQo+Pj4+IA0KPj4+PiAzNiAvKg0KPj4+PiAzNyAgKiBU aGlzIHdhcyBvcmlnaW5hbGx5IHN1Ym1pdHRlZCB0bw0KPj4+PiAzOCAgKiBodHRwOi8vYnVnemls bGEua2VybmVsLm9yZy9zaG93X2J1Zy5jZ2k/aWQ9NjgzMSBieQ0KPj4+PiAzOSAgKiBSYWZhbCBX aWphdGEgPHdpamF0YUBuZWMtbGFicy5jb20+LiAgSXQgY2F1Z2h0IGEgcmFjZSBpbiBkaW8gYWlv IGNvbXBsZXRpb24NCj4+Pj4gNDAgICogdGhhdCB3b3VsZCBjYWxsIGFpb19jb21wbGV0ZSgpIGJl Zm9yZSB0aGUgZGlvIGNhbGxlcnMgd291bGQgdXBkYXRlIGlfc2l6ZS4NCj4+Pj4gNDEgICogQSBz dGF0IGFmdGVyIGlvX2dldGV2ZW50cygpIHdvdWxkIG5vdCBzZWUgdGhlIG5ldyBmaWxlIHNpemUu DQo+Pj4+IDQyICAqDQo+Pj4+IDQzICAqIFRoZSBidWcgd2FzIGZpeGVkIGluIHRoZSBmcy9kaXJl Y3QtaW8uYyBjb21wbGV0aW9uIHJld29ya2luZyB0aGF0IGFwcGVhcmVkDQo+Pj4+IDQ0ICAqIGlu IDIuNi4yMC4gIFRoaXMgdGVzdCBzaG91bGQgZmFpbCBvbiAyLjYuMTkuDQo+Pj4+IDQ1ICAqLw0K Pj4+PiANCj4+Pj4gQXMgZmFyIGFzIEkgY2FuIHNlZSwgdGhpcyBjaGVjayBpcyB0aGUgd2hvbGUg cG9pbnQgb2YgZ2VuZXJpYy8yMDcuLg0KPj4+IA0KPj4+IFRoaXMgd291bGQgZml4IGl0IHVwOg0K Pj4+IA0KPj4+IGRpZmYgLS1naXQgYS9mcy9uZnMvaW5vZGUuYyBiL2ZzL25mcy9pbm9kZS5jDQo+ Pj4gaW5kZXggZjEwOGQ1ODEwMWY4Li44MjM3MDBmODI3YjYgMTAwNjQ0DQo+Pj4gLS0tIGEvZnMv bmZzL2lub2RlLmMNCj4+PiArKysgYi9mcy9uZnMvaW5vZGUuYw0KPj4+IEBAIC02NjEsNiArNjYx LDcgQEAgaW50IG5mc19nZXRhdHRyKHN0cnVjdCB2ZnNtb3VudCAqbW50LCBzdHJ1Y3QgZGVudHJ5 DQo+Pj4gKmRlbnRyeSwgc3RydWN0IGtzdGF0ICpzdGF0KQ0KPj4+IA0KPj4+ICAgICAgIHRyYWNl X25mc19nZXRhdHRyX2VudGVyKGlub2RlKTsNCj4+PiAgICAgICAvKiBGbHVzaCBvdXQgd3JpdGVz IHRvIHRoZSBzZXJ2ZXIgaW4gb3JkZXIgdG8gdXBkYXRlIGMvbXRpbWUuICAqLw0KPj4+ICsgICAg ICAgbmZzX3N0YXJ0X2lvX3JlYWQoaW5vZGUpOw0KPj4+ICAgICAgIGlmIChTX0lTUkVHKGlub2Rl LT5pX21vZGUpKSB7DQo+Pj4gICAgICAgICAgICAgICBlcnIgPSBmaWxlbWFwX3dyaXRlX2FuZF93 YWl0KGlub2RlLT5pX21hcHBpbmcpOw0KPj4+ICAgICAgICAgICAgICAgaWYgKGVycikNCj4+PiBA QCAtNjk0LDYgKzY5NSw3IEBAIGludCBuZnNfZ2V0YXR0cihzdHJ1Y3QgdmZzbW91bnQgKm1udCwg c3RydWN0IGRlbnRyeQ0KPj4+ICpkZW50cnksIHN0cnVjdCBrc3RhdCAqc3RhdCkNCj4+PiAgICAg ICAgICAgICAgICAgICAgICAgc3RhdC0+Ymxrc2l6ZSA9IE5GU19TRVJWRVIoaW5vZGUpLT5kdHNp emU7DQo+Pj4gICAgICAgfQ0KPj4+IG91dDoNCj4+PiArICAgICAgIG5mc19lbmRfaW9fcmVhZChp bm9kZSk7DQo+Pj4gICAgICAgdHJhY2VfbmZzX2dldGF0dHJfZXhpdChpbm9kZSwgZXJyKTsNCj4+ PiAgICAgICByZXR1cm4gZXJyOw0KPj4+IH0NCj4+PiANCj4+PiBUcm9uZCwgd2hhdCBkbyB5b3Ug dGhpbms/ICBJJ2xsIHRha2UgYW55IGFkZGl0aW9uYWwgc2lsZW5jZSBhcyBhIHNpZ24gdG8gZ28N Cj4+PiBlbHNld2hlcmUuICA6UA0KPj4gDQo+PiBOby4gVGhlIGFib3ZlIGxvY2tpbmcgZXhjbHVk ZXMgYWxsIHdyaXRlcyBhcyB3ZWxsIGFzIE9fRElSRUNUIHJlYWRz4oCmIFRoYXTigJlzIHdvcnNl IHRoYW4gd2UgaGFkIGJlZm9yZS4NCj4+IA0KPj4gSeKAmWQgbGlrZSByYXRoZXIgdG8gdW5kZXJz dGFuZCBfd2h5XyB0aGUgYWlvX2NvbXBsZXRlKCkgaXMgZmFpbGluZyB0byB3b3JrIGNvcnJlY3Rs eSBoZXJlLiBBY2NvcmRpbmcgdG8gdGhlIGFuYWx5c2lzIG9mIHRoZSB0ZXN0IGNhc2UgdGhhdCB5 b3UgcXVvdGVkIHllc3RlcmRheSwgdGhlIE9fRElSRUNUIHdyaXRlcyBzaG91bGQgaGF2ZSBjb21w bGV0ZWQgYmVmb3JlIHdlIGV2ZW4gY2FsbCBzdGF0KCkuDQo+PiANCj4+IENoZWVycw0KPj4gIFRy b25kDQo+IA0KPiBUaGUgT19ESVJFQ1Qgd3JpdGVzIGRvIGNvbXBsZXRlLCBhbmQgZXZlcnkgY29t cGxldGlvbiBzaWduYWxzIHRoZSBvdGhlcg0KPiB0aHJlYWQgdG8gZG8gc3RhdCgpLCBidXQgdGhh dCBjb21wbGV0aW9uIGRvZXMgbm90IHVwZGF0ZSB0aGUgc2l6ZSBvbiB0aGUNCj4gc2VydmVyLiAg QXMgd2Uga25vdywgd2UgbmVlZCBhIExBWU9VVENPTU1JVC4gIEFmdGVyIHRoaXMgcGF0Y2gsIHdl J3JlIG9ubHkNCj4gZ29pbmcgdG8gZG8gYSBMQVlPVVRDT01NSVQgaWYgbmZzX25lZWRfcmV2YWxp ZGF0ZV9pbm9kZShpbm9kZSkuDQo+IA0KDQpTbyBob3cgaXMgdGhlIGNvbXBsZXRpb24gaGFwcGVu aW5nPyBBcyBmYXIgYXMgSSBrbm93LCB0aGlzIGlzIHdoYXQgaXMgc3VwcG9zZWQgdG8gaGFwcGVu Og0KDQotIGJsX3dyaXRlX2NsZWFudXAoKSBjYWxscyBwbmZzX2xkX3dyaXRlX2RvbmUoKSwNCiAg ICAtIHBuZnNfbGRfd3JpdGVfZG9uZSB0aGVuIGZpcnN0IGNhbGxzIHBuZnNfc2V0X2xheW91dGNv bW1pdCgpIGFuZCBuZnNfcGdpb19yZXN1bHQoKSAod2hpY2ggY2FsbHMgbmZzX3dyaXRlYmFja19k b25lKCkgYW5kIGV2ZW50dWFsbHkgbmZzX3dyaXRlYmFja191cGRhdGVfaW5vZGUoKSkNCiAgICAt IHBuZnNfbGRfd3JpdGVfZG9uZSB0aGVuIGNhbGxzIG5mc19wZ2lvX3JlbGVhc2UoKSwgd2hpY2gg YWdhaW4gY2FsbHMgbmZzX2RpcmVjdF93cml0ZV9jb21wbGV0aW9uKCkuDQoNCklzIHNvbWV0aGlu ZyBzZXR0aW5nIGhkci0+cG5mc19lcnJvciBhbmQgcHJldmVudGluZyB0aGUgY2FsbCB0byBwbmZz X3NldF9sYXlvdXRjb21taXQgYW5kIG5mc193cml0ZWJhY2tfZG9uZSgpPyBJZiBub3QsIHRoZW4g d2h5IGlzIHRoZSBjYWxsIHRvIG5mc193cml0ZWJhY2tfdXBkYXRlX2lub2RlKCkgbm90IHNldHRp bmcgdGhlIGZpbGUgc2l6ZT8NCg0KPiBTbyB3aGF0IGhhcHBlbnMgaXMgdGhhdCB0aGUgZmlyc3Qg d3JpdGUgY29tcGxldGVzLCBhbmQgdGhlIGZpcnN0DQo+IG5mc19nZXRhdHRyKCkgdHJpZ2dlcnMg dGhlIGZpcnN0IExBWU9VVENPTU1JVCwgYW5kIHRoZW4gYSBHRVRBVFRSLg0KPiBTaW11bHRhbmVv dXNseSwgdGhlIHNlY29uZCB3cml0ZSBpcyB3YWl0aW5nIG9uIGEgTEFZT1VUR0VULiAgVGhlIEdF VEFUVFINCj4gY29tcGxldGVzIGFuZCBzZXRzIHRoZSBzaXplIHRvIDRrLg0KPiANCj4gTm93IHRo ZSBhdHRyaWJ1dGVzIGFyZSBtYXJrZWQgdXAgdG8gZGF0ZSB3aXRoIGEgc2l6ZSBvZiA0aywgYW5k IHdoZW4gdGhlDQo+IHNlY29uZCB3cml0ZSBjb21wbGV0ZXMsIG5mc19nZXRhdHRyKCkgaXMgY2Fs bGVkLCBuZnNfbmVlZF9yZXZhbGlkYXRlX2lub2RlKCkNCj4gaXMgZmFsc2UsIGFuZCB3ZSBkb24n dCBib3RoZXIgdG8gc2VuZCBhbm90aGVyIExBWU9VVENPTU1JVCBvciBHRVRBVFRSIHRvDQo+IGNv cnJlY3QgdGhlIGNhY2hlZCBmaWxlIHNpemUuDQo+IA0KPiBIZXJlJ3MgYSBmdW5jdGlvbiBncmFw aCBvZiB0aGF0OiBodHRwOi8vcGVvcGxlLnJlZGhhdC5jb20vYmNvZGRpbmcvYm9ya2VuDQo+IA0K PiBXZSBjb3VsZCBpbnZhbGlkYXRlIHRoZSBhdHRyaWJ1dGUgY2FjaGUgZXZlcnkgdGltZSBhIHdy aXRlIGNvbXBsZXRlcy4uIG1heWJlDQo+IG5mc193cml0ZWJhY2tfdXBkYXRlX2lub2RlKCkgaXNu J3QgZG9pbmcgdGhlIGpvYiBmb3IgYmxvY2sgbGF5b3V0cyAoYXJlIHdlDQo+IG5vdCBzZXR0aW5n IHJlcy5jb3VudD8gIEknbGwgbG9vayBhdCB0aGF0Li4pDQo+IA0KPiBJIHRoaW5rIHdlIGNvdWxk IGFsc28gdXNlIHRoZSBMQVlPVVRDT01NSVQgcmVzdWx0cyB0byBpbnZhbGlkYXRlIHRoZSBjYWNo ZSwNCj4gUkZDLTU2NjEgMTguNDIuMzoNCj4gICBJZiB0aGUgbWV0YWRhdGEgc2VydmVyIHVwZGF0 ZXMgdGhlIGZpbGUncyBzaXplIGFzIHRoZQ0KPiAgIHJlc3VsdCBvZiB0aGUgTEFZT1VUQ09NTUlU IG9wZXJhdGlvbiwgaXQgbXVzdCByZXR1cm4gdGhlIG5ldyBzaXplDQo+ICAgKGxvY3JfbmV3c2l6 ZS5uc19zaXplKSBhcyBwYXJ0IG9mIHRoZSByZXN1bHRzLiINCj4gDQo+IEknbSBub3Qgc3VyZSB5 b3Ugd2FudCB0byBib3RoZXIgdG8gdHJ5IHRvIHJlcHJvZHVjZSAtLSBidXQgaWYgc28sIHlvdSBk b24ndA0KPiBuZWVkIHNwZWNpYWwgaGFyZHdhcmUgZm9yIFNDU0kgbGF5b3V0LiAgSSB3cm90ZSB1 cCBhIHF1aWNrIGhvdy10byBmb3IgU0NTSQ0KPiBsYXlvdXRzIG9uIGEgVk0gaW4gcWVtdToNCj4g aHR0cDovL3Blb3BsZS5yZWRoYXQuY29tL2Jjb2RkaW5nL3BuZnMvbmZzL3Njc2kvMjAxNi8wNy8x My9wbmZzX3Njc2lfc2V0dXBfZm9yX1ZNcy8NCj4gDQo+IEJlbg0KDQo= ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-26 18:07 ` Trond Myklebust @ 2016-07-27 11:55 ` Benjamin Coddington 2016-07-27 12:15 ` Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Benjamin Coddington @ 2016-07-27 11:55 UTC (permalink / raw) To: Trond Myklebust; +Cc: hch, List Linux NFS Mailing On 26 Jul 2016, at 14:07, Trond Myklebust wrote: >> On Jul 26, 2016, at 13:57, Benjamin Coddington <bcodding@redhat.com> >> wrote: >> >> On 26 Jul 2016, at 12:35, Trond Myklebust wrote: >> >>>> On Jul 26, 2016, at 12:32, Benjamin Coddington >>>> <bcodding@redhat.com> wrote: >>>> >>>> On 25 Jul 2016, at 14:41, Benjamin Coddington wrote: >>>> >>>>> On 25 Jul 2016, at 14:34, Trond Myklebust wrote: >>>>> >>>>>>> On Jul 25, 2016, at 14:26, Benjamin Coddington >>>>>>> <bcodding@redhat.com> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 25 Jul 2016, at 12:39, Trond Myklebust wrote: >>>>>>> >>>>>>>>> On Jul 25, 2016, at 12:26, Benjamin Coddington >>>>>>>>> <bcodding@redhat.com> wrote: >>>>>>>>> >>>>>>>>> On 21 Jul 2016, at 9:20, Trond Myklebust wrote: >>>>>>>>> >>>>>>>>>>> On Jul 21, 2016, at 09:05, Benjamin Coddington >>>>>>>>>>> <bcodding@redhat.com> wrote: >>>>>>>>>>> >>>>>>>>>>> So back to Christoph's point earlier: >>>>>>>>>>> >>>>>>>>>>> On 17 Jul 2016, at 23:48, Christoph Hellwig wrote: >>>>>>>>>>>> This one breaks xfstests generic/207 on block/scsi layout >>>>>>>>>>>> for me. The >>>>>>>>>>>> reason for that is that we need a layoutcommit after >>>>>>>>>>>> writing out all >>>>>>>>>>>> data for the file for the file size to be updated on the >>>>>>>>>>>> server. >>>>>>>>>>> >>>>>>>>>>> You responded: >>>>>>>>>>> >>>>>>>>>>> On 18 Jul 2016, at 0:32, Trond Myklebust wrote: >>>>>>>>>>>> I’m not understanding this argument. Why do we care if >>>>>>>>>>>> the file size is up >>>>>>>>>>>> to date on the server if we’re not sending an actual >>>>>>>>>>>> GETATTR on the wire >>>>>>>>>>>> to retrieve the file size? >>>>>>>>>>> >>>>>>>>>>> I guess the answer might be because we can get it back from >>>>>>>>>>> the last >>>>>>>>>>> LAYOUTCOMMIT. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The patch that I followed up with should now ensure that we >>>>>>>>>> do not mark the attribute cache as up to date if there is a >>>>>>>>>> LAYOUTCOMMIT pending. >>>>>>>>>> IOW: when the pNFS write is done, it is expected to do 2 >>>>>>>>>> things: >>>>>>>>>> >>>>>>>>>> 1) mark the inode for LAYOUTCOMMIT >>>>>>>>>> 2) mark the attribute cache as invalid (because we know the >>>>>>>>>> change attribute, mtime, ctime need to be updates) >>>>>>>>>> >>>>>>>>>> In the case of blocks pNFS write: >>>>>>>>>> The call to pnfs_set_layoutcommit() in pnfs_ld_write_done() >>>>>>>>>> should take care of (1) >>>>>>>>>> The call to nfs_writeback_update_inode() in >>>>>>>>>> nfs4_write_done_cb() should take care of (2). >>>>>>>>>> >>>>>>>>>> Provided that these 2 calls are performed in the above order, >>>>>>>>>> then any call to nfs_getattr() which has not been preceded by >>>>>>>>>> a call to nfs4_proc_layoutcommit() should trigger the call to >>>>>>>>>> __nfs_revalidate_inode(). >>>>>>>>> >>>>>>>>> I think the problem is that a following nfs_getattr() will >>>>>>>>> fail to notice >>>>>>>>> the size change in the case of a write_completion and >>>>>>>>> layoutcommit occuring >>>>>>>>> after nfs_getattr() has done pnfs_sync_inode() but before it >>>>>>>>> has done >>>>>>>>> nfs_update_inode(). >>>>>>>>> >>>>>>>>> In the failing case there are two threads one is doing writes, >>>>>>>>> the other >>>>>>>>> doing lstat on aio_complete via io_getevents(2). >>>>>>>>> >>>>>>>>> For each write completion the lstat thread tries to verify the >>>>>>>>> file size. >>>>>>>>> >>>>>>>>> GETATTR Thread LAYOUTCOMMIT Thread >>>>>>>>> -------------- -------------------- >>>>>>>>> write_completion sets LAYOUTCOMMIT >>>>>>>>> (4096@0) >>>>>>>>> --> nfs_getattr >>>>>>>> >>>>>>>> filemap_write_and_wait() >>>>>>>> >>>>>>>>> __nfs_revalidate_inode >>>>>>>>> pnfs_sync_inode >>>>>>>> >>>>>>>> NFS_PROTO(inode)->getattr() >>>>>>>> >>>>>>>>> getattr sees 4096 >>>>>>>>> write_completion sets LAYOUTCOMMIT >>>>>>>>> (4096@4096) >>>>>>>>> sets LAYOUTCOMMITING >>>>>>>>> clears LAYOUTCOMMIT >>>>>>>>> clears LAYOUTCOMMITTING >>>>>>>>> nfs_refresh_inode >>>>>>>>> nfs_update_inode size is 4096 >>>>>>>>> <-- nfs_getattr >>>>>>>>> >>>>>>>>> At this point the cached attributes are seen as up to date, >>>>>>>>> but >>>>>>>>> aio-dio-extend-stat program expects that second >>>>>>>>> write_completion to reflect >>>>>>>>> in the file size. >>>>>>>>> >>>>>>>> >>>>>>>> Why isn’t the filemap_write_and_wait() above resolving the >>>>>>>> race? I’d >>>>>>>> expect that would move your “write completion sets >>>>>>>> LAYOUTCOMMIT” up to >>>>>>>> before the pnfs_sync_inode(). In fact, in the patch that >>>>>>>> Christoph sent, >>>>>>>> all he was doing was moving the pnfs_sync_inode() to >>>>>>>> immediately after >>>>>>>> that filemap_write_and_wait() instead of relying on it in >>>>>>>> _nfs_revalidate_inode. >>>>>>> >>>>>>> This is O_DIRECT, I've failed to mention yet. The second write >>>>>>> hasn't made >>>>>>> it out of __nfs_pageio_add_request() at the time >>>>>>> filemap_write_and_wait() is >>>>>>> called. It is sleeping in pnfs_update_layout() waiting on a >>>>>>> LAYOUTGET and it >>>>>>> doesn't resumes until after filemap_write_and_wait(). >>>>>> >>>>>> Wait, so you have 1 thread doing an O_DIRECT write() and another >>>>>> doing a >>>>>> stat() in parallel? Why would there be an expectation that the >>>>>> filesystem >>>>>> should serialise those system calls? >>>>> >>>>> Not exactly parallel, but synchronized on aio_complete. A comment >>>>> in >>>>> generic/207's src/aio-dio-regress/aio-dio-extend-stat.c: >>>>> >>>>> 36 /* >>>>> 37 * This was originally submitted to >>>>> 38 * http://bugzilla.kernel.org/show_bug.cgi?id=6831 by >>>>> 39 * Rafal Wijata <wijata@nec-labs.com>. It caught a race in dio >>>>> aio completion >>>>> 40 * that would call aio_complete() before the dio callers would >>>>> update i_size. >>>>> 41 * A stat after io_getevents() would not see the new file size. >>>>> 42 * >>>>> 43 * The bug was fixed in the fs/direct-io.c completion reworking >>>>> that appeared >>>>> 44 * in 2.6.20. This test should fail on 2.6.19. >>>>> 45 */ >>>>> >>>>> As far as I can see, this check is the whole point of >>>>> generic/207.. >>>> >>>> This would fix it up: >>>> >>>> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c >>>> index f108d58101f8..823700f827b6 100644 >>>> --- a/fs/nfs/inode.c >>>> +++ b/fs/nfs/inode.c >>>> @@ -661,6 +661,7 @@ int nfs_getattr(struct vfsmount *mnt, struct >>>> dentry >>>> *dentry, struct kstat *stat) >>>> >>>> trace_nfs_getattr_enter(inode); >>>> /* Flush out writes to the server in order to update c/mtime. >>>> */ >>>> + nfs_start_io_read(inode); >>>> if (S_ISREG(inode->i_mode)) { >>>> err = filemap_write_and_wait(inode->i_mapping); >>>> if (err) >>>> @@ -694,6 +695,7 @@ int nfs_getattr(struct vfsmount *mnt, struct >>>> dentry >>>> *dentry, struct kstat *stat) >>>> stat->blksize = NFS_SERVER(inode)->dtsize; >>>> } >>>> out: >>>> + nfs_end_io_read(inode); >>>> trace_nfs_getattr_exit(inode, err); >>>> return err; >>>> } >>>> >>>> Trond, what do you think? I'll take any additional silence as a >>>> sign to go >>>> elsewhere. :P >>> >>> No. The above locking excludes all writes as well as O_DIRECT >>> reads… That’s worse than we had before. >>> >>> I’d like rather to understand _why_ the aio_complete() is failing >>> to work correctly here. According to the analysis of the test case >>> that you quoted yesterday, the O_DIRECT writes should have completed >>> before we even call stat(). >>> >>> Cheers >>> Trond >> >> The O_DIRECT writes do complete, and every completion signals the >> other >> thread to do stat(), but that completion does not update the size on >> the >> server. As we know, we need a LAYOUTCOMMIT. After this patch, we're >> only >> going to do a LAYOUTCOMMIT if nfs_need_revalidate_inode(inode). >> > > So how is the completion happening? As far as I know, this is what is > supposed to happen: > > - bl_write_cleanup() calls pnfs_ld_write_done(), > - pnfs_ld_write_done then first calls pnfs_set_layoutcommit() and > nfs_pgio_result() (which calls nfs_writeback_done() and eventually > nfs_writeback_update_inode()) > - pnfs_ld_write_done then calls nfs_pgio_release(), which again > calls nfs_direct_write_completion(). > > Is something setting hdr->pnfs_error and preventing the call to > pnfs_set_layoutcommit and nfs_writeback_done()? If not, then why is > the call to nfs_writeback_update_inode() not setting the file size? After adding more debugging, I see that all of that is working correctly, but the first LAYOUTCOMMIT is taking the size back down to 4096 from the last nfs_writeback_done(), and the second LAYOUTCOMMIT never brings it back up again. Now I see that we should be marking the block extents as written atomically with setting LAYOUTCOMMIT and nfsi->layout->plh_lwb, otherwise a LAYOUTCOMMIT can collect extents just added from the next bl_write_cleanup(). Then, the next LAYOUTCOMMIT fails, and all we're left with is the size from the first LAYOUTCOMMIT. Not sure if that particular problem is the whole fix, but that's something to work on. I see ways to fix that: - make a new pnfs_set_layoutcommit_locked() that can be used to call ext_tree_mark_written() inside the i_lock - make another pnfs_layoutdriver_type operation to be used within pnfs_set_layoutcommit (mark_layoutcommit? set_layoutcommit?), and call ext_tree_mark_written() within that.. - have .prepare_layoutcommit return a new positive plh_lwb that would extend the current LAYOUTCOMMIT - make ext_tree_prepare_commit only encode up to plh_lwb Ben ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-27 11:55 ` Benjamin Coddington @ 2016-07-27 12:15 ` Trond Myklebust 2016-07-27 12:31 ` Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-27 12:15 UTC (permalink / raw) To: Coddington Benjamin; +Cc: hch, List Linux NFS Mailing DQo+IE9uIEp1bCAyNywgMjAxNiwgYXQgMDc6NTUsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2Rk aW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPiANCj4gQWZ0ZXIgYWRkaW5nIG1vcmUgZGVidWdnaW5n LCBJIHNlZSB0aGF0IGFsbCBvZiB0aGF0IGlzIHdvcmtpbmcgY29ycmVjdGx5LA0KPiBidXQgdGhl IGZpcnN0IExBWU9VVENPTU1JVCBpcyB0YWtpbmcgdGhlIHNpemUgYmFjayBkb3duIHRvIDQwOTYg ZnJvbSB0aGUNCj4gbGFzdCBuZnNfd3JpdGViYWNrX2RvbmUoKSwgYW5kIHRoZSBzZWNvbmQgTEFZ T1VUQ09NTUlUIG5ldmVyIGJyaW5ncyBpdCBiYWNrDQo+IHVwIGFnYWluLg0KPiANCg0KRXhjZWxs ZW50ISBUaGFua3MgZm9yIGRlYnVnZ2luZyB0aGF0Lg0KDQo+IE5vdyBJIHNlZSB0aGF0IHdlIHNo b3VsZCBiZSBtYXJraW5nIHRoZSBibG9jayBleHRlbnRzIGFzIHdyaXR0ZW4gYXRvbWljYWxseSB3 aXRoDQo+IHNldHRpbmcgTEFZT1VUQ09NTUlUIGFuZCBuZnNpLT5sYXlvdXQtPnBsaF9sd2IsIG90 aGVyd2lzZSBhIExBWU9VVENPTU1JVCBjYW4NCj4gY29sbGVjdCBleHRlbnRzIGp1c3QgYWRkZWQg ZnJvbSB0aGUgbmV4dCBibF93cml0ZV9jbGVhbnVwKCkuICBUaGVuLCB0aGUgbmV4dA0KPiBMQVlP VVRDT01NSVQgZmFpbHMsIGFuZCBhbGwgd2UncmUgbGVmdCB3aXRoIGlzIHRoZSBzaXplIGZyb20g dGhlIGZpcnN0DQo+IExBWU9VVENPTU1JVC4gIE5vdCBzdXJlIGlmIHRoYXQgcGFydGljdWxhciBw cm9ibGVtIGlzIHRoZSB3aG9sZSBmaXgsIGJ1dA0KPiB0aGF0J3Mgc29tZXRoaW5nIHRvIHdvcmsg b24uDQo+IA0KPiBJIHNlZSB3YXlzIHRvIGZpeCB0aGF0Og0KPiANCj4gICAgLSBtYWtlIGEgbmV3 IHBuZnNfc2V0X2xheW91dGNvbW1pdF9sb2NrZWQoKSB0aGF0IGNhbiBiZSB1c2VkIHRvIGNhbGwN Cj4gICAgICBleHRfdHJlZV9tYXJrX3dyaXR0ZW4oKSBpbnNpZGUgdGhlIGlfbG9jaw0KPiANCj4g ICAgLSBtYWtlIGFub3RoZXIgcG5mc19sYXlvdXRkcml2ZXJfdHlwZSBvcGVyYXRpb24gdG8gYmUg dXNlZCB3aXRoaW4NCj4gICAgICBwbmZzX3NldF9sYXlvdXRjb21taXQgKG1hcmtfbGF5b3V0Y29t bWl0PyBzZXRfbGF5b3V0Y29tbWl0PyksIGFuZCBjYWxsDQo+ICAgICAgZXh0X3RyZWVfbWFya193 cml0dGVuKCkgd2l0aGluIHRoYXQuLg0KPiANCj4gICAgLSBoYXZlIC5wcmVwYXJlX2xheW91dGNv bW1pdCByZXR1cm4gYSBuZXcgcG9zaXRpdmUgcGxoX2x3YiB0aGF0IHdvdWxkDQo+ICAgICAgZXh0 ZW5kIHRoZSBjdXJyZW50IExBWU9VVENPTU1JVA0KPiANCj4gICAgLSBtYWtlIGV4dF90cmVlX3By ZXBhcmVfY29tbWl0IG9ubHkgZW5jb2RlIHVwIHRvIHBsaF9sd2INCg0KSSBzZWUgbm8gcmVhc29u IHdoeSBleHRfdHJlZV9wcmVwYXJlX2NvbW1pdCgpIHNob3VsZG7igJl0IGJlIGFsbG93ZWQgdG8g ZXh0ZW5kIHRoZSBhcmdzLT5sYXN0Ynl0ZXdyaXR0ZW4uIFRoaXMgaXMgYSBtZXRhZGF0YSBvcGVy YXRpb24gdGhhdCBpcyBvd25lZCBieSB0aGUgcE5GUyBsYXlvdXQgZHJpdmVyLg0KVGhlIG9ubHkg dGhpbmcgSeKAmWQgbm90ZSBpcyB5b3Ugc2hvdWxkIHRoZW4gcmV3cml0ZSB0aGUgZmFpbHVyZSBj YXNlIGluIHBuZnNfbGF5b3V0Y29tbWl0X2lub2RlKCkgc28gdGhhdCBpdCBkb2VzbuKAmXQgcmVs eSBvbiB0aGUgc2F2ZWQg4oCcZW5kX3Bvc+KAnSwgYnV0IHVzZXMgYXJncy0+bGFzdGJ5dGV3cml0 dGVuIGluc3RlYWQgKHdpdGggYSBjb21tZW50IHRvIHRoZSBlZmZlY3Qgd2h5KeKApg0KDQpDaGVl cnMNCiAgVHJvbmQ= ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-27 12:15 ` Trond Myklebust @ 2016-07-27 12:31 ` Trond Myklebust 2016-07-27 16:14 ` Benjamin Coddington 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-27 12:31 UTC (permalink / raw) To: Coddington Benjamin; +Cc: hch, List Linux NFS Mailing DQo+IE9uIEp1bCAyNywgMjAxNiwgYXQgMDg6MTUsIFRyb25kIE15a2xlYnVzdCA8dHJvbmRteUBw cmltYXJ5ZGF0YS5jb20+IHdyb3RlOg0KPiANCj4gDQo+PiBPbiBKdWwgMjcsIDIwMTYsIGF0IDA3 OjU1LCBCZW5qYW1pbiBDb2RkaW5ndG9uIDxiY29kZGluZ0ByZWRoYXQuY29tPiB3cm90ZToNCj4+ IA0KPj4gQWZ0ZXIgYWRkaW5nIG1vcmUgZGVidWdnaW5nLCBJIHNlZSB0aGF0IGFsbCBvZiB0aGF0 IGlzIHdvcmtpbmcgY29ycmVjdGx5LA0KPj4gYnV0IHRoZSBmaXJzdCBMQVlPVVRDT01NSVQgaXMg dGFraW5nIHRoZSBzaXplIGJhY2sgZG93biB0byA0MDk2IGZyb20gdGhlDQo+PiBsYXN0IG5mc193 cml0ZWJhY2tfZG9uZSgpLCBhbmQgdGhlIHNlY29uZCBMQVlPVVRDT01NSVQgbmV2ZXIgYnJpbmdz IGl0IGJhY2sNCj4+IHVwIGFnYWluLg0KPj4gDQo+IA0KPiBFeGNlbGxlbnQhIFRoYW5rcyBmb3Ig ZGVidWdnaW5nIHRoYXQuDQo+IA0KPj4gTm93IEkgc2VlIHRoYXQgd2Ugc2hvdWxkIGJlIG1hcmtp bmcgdGhlIGJsb2NrIGV4dGVudHMgYXMgd3JpdHRlbiBhdG9taWNhbGx5IHdpdGgNCj4+IHNldHRp bmcgTEFZT1VUQ09NTUlUIGFuZCBuZnNpLT5sYXlvdXQtPnBsaF9sd2IsIG90aGVyd2lzZSBhIExB WU9VVENPTU1JVCBjYW4NCj4+IGNvbGxlY3QgZXh0ZW50cyBqdXN0IGFkZGVkIGZyb20gdGhlIG5l eHQgYmxfd3JpdGVfY2xlYW51cCgpLiAgVGhlbiwgdGhlIG5leHQNCj4+IExBWU9VVENPTU1JVCBm YWlscywgYW5kIGFsbCB3ZSdyZSBsZWZ0IHdpdGggaXMgdGhlIHNpemUgZnJvbSB0aGUgZmlyc3QN Cj4+IExBWU9VVENPTU1JVC4gIE5vdCBzdXJlIGlmIHRoYXQgcGFydGljdWxhciBwcm9ibGVtIGlz IHRoZSB3aG9sZSBmaXgsIGJ1dA0KPj4gdGhhdCdzIHNvbWV0aGluZyB0byB3b3JrIG9uLg0KPj4g DQo+PiBJIHNlZSB3YXlzIHRvIGZpeCB0aGF0Og0KPj4gDQo+PiAgIC0gbWFrZSBhIG5ldyBwbmZz X3NldF9sYXlvdXRjb21taXRfbG9ja2VkKCkgdGhhdCBjYW4gYmUgdXNlZCB0byBjYWxsDQo+PiAg ICAgZXh0X3RyZWVfbWFya193cml0dGVuKCkgaW5zaWRlIHRoZSBpX2xvY2sNCj4+IA0KPj4gICAt IG1ha2UgYW5vdGhlciBwbmZzX2xheW91dGRyaXZlcl90eXBlIG9wZXJhdGlvbiB0byBiZSB1c2Vk IHdpdGhpbg0KPj4gICAgIHBuZnNfc2V0X2xheW91dGNvbW1pdCAobWFya19sYXlvdXRjb21taXQ/ IHNldF9sYXlvdXRjb21taXQ/KSwgYW5kIGNhbGwNCj4+ICAgICBleHRfdHJlZV9tYXJrX3dyaXR0 ZW4oKSB3aXRoaW4gdGhhdC4uDQo+PiANCj4+ICAgLSBoYXZlIC5wcmVwYXJlX2xheW91dGNvbW1p dCByZXR1cm4gYSBuZXcgcG9zaXRpdmUgcGxoX2x3YiB0aGF0IHdvdWxkDQo+PiAgICAgZXh0ZW5k IHRoZSBjdXJyZW50IExBWU9VVENPTU1JVA0KPj4gDQo+PiAgIC0gbWFrZSBleHRfdHJlZV9wcmVw YXJlX2NvbW1pdCBvbmx5IGVuY29kZSB1cCB0byBwbGhfbHdiDQo+IA0KPiBJIHNlZSBubyByZWFz b24gd2h5IGV4dF90cmVlX3ByZXBhcmVfY29tbWl0KCkgc2hvdWxkbuKAmXQgYmUgYWxsb3dlZCB0 byBleHRlbmQgdGhlIGFyZ3MtPmxhc3RieXRld3JpdHRlbi4gVGhpcyBpcyBhIG1ldGFkYXRhIG9w ZXJhdGlvbiB0aGF0IGlzIG93bmVkIGJ5IHRoZSBwTkZTIGxheW91dCBkcml2ZXIuDQo+IFRoZSBv bmx5IHRoaW5nIEnigJlkIG5vdGUgaXMgeW91IHNob3VsZCB0aGVuIHJld3JpdGUgdGhlIGZhaWx1 cmUgY2FzZSBpbiBwbmZzX2xheW91dGNvbW1pdF9pbm9kZSgpIHNvIHRoYXQgaXQgZG9lc27igJl0 IHJlbHkgb24gdGhlIHNhdmVkIOKAnGVuZF9wb3PigJ0sIGJ1dCB1c2VzIGFyZ3MtPmxhc3RieXRl d3JpdHRlbiBpbnN0ZWFkICh3aXRoIGEgY29tbWVudCB0byB0aGUgZWZmZWN0IHdoeSnigKYNCg0K SW4gZmFjdCwgZ2l2ZW4gdGhlIHBvdGVudGlhbCBmb3IgcmFjZXMgaGVyZSwgSSB0aGluayB0aGUg cmlnaHQgdGhpbmcgdG8gZG8gaXMgdG8gaGF2ZSBleHRfdHJlZV9wcmVwYXJlX2NvbW1pdCgpIGFs d2F5cyBzZXQgdGhlIGNvcnJlY3QgdmFsdWUgZm9yIGFyZ3MtPmxhc3RieXRld3JpdHRlbi4NCg0K Q2hlZXJzDQogIFRyb25k ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-27 12:31 ` Trond Myklebust @ 2016-07-27 16:14 ` Benjamin Coddington 2016-07-27 18:05 ` Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Benjamin Coddington @ 2016-07-27 16:14 UTC (permalink / raw) To: Trond Myklebust; +Cc: hch, List Linux NFS Mailing On 27 Jul 2016, at 8:31, Trond Myklebust wrote: >> On Jul 27, 2016, at 08:15, Trond Myklebust <trondmy@primarydata.com> >> wrote: >> >> >>> On Jul 27, 2016, at 07:55, Benjamin Coddington <bcodding@redhat.com> >>> wrote: >>> >>> After adding more debugging, I see that all of that is working >>> correctly, >>> but the first LAYOUTCOMMIT is taking the size back down to 4096 from >>> the >>> last nfs_writeback_done(), and the second LAYOUTCOMMIT never brings >>> it back >>> up again. >>> >> >> Excellent! Thanks for debugging that. >> >>> Now I see that we should be marking the block extents as written >>> atomically with >>> setting LAYOUTCOMMIT and nfsi->layout->plh_lwb, otherwise a >>> LAYOUTCOMMIT can >>> collect extents just added from the next bl_write_cleanup(). Then, >>> the next >>> LAYOUTCOMMIT fails, and all we're left with is the size from the >>> first >>> LAYOUTCOMMIT. Not sure if that particular problem is the whole fix, >>> but >>> that's something to work on. >>> >>> I see ways to fix that: >>> >>> - make a new pnfs_set_layoutcommit_locked() that can be used to >>> call >>> ext_tree_mark_written() inside the i_lock >>> >>> - make another pnfs_layoutdriver_type operation to be used within >>> pnfs_set_layoutcommit (mark_layoutcommit? set_layoutcommit?), >>> and call >>> ext_tree_mark_written() within that.. >>> >>> - have .prepare_layoutcommit return a new positive plh_lwb that >>> would >>> extend the current LAYOUTCOMMIT >>> >>> - make ext_tree_prepare_commit only encode up to plh_lwb >> >> I see no reason why ext_tree_prepare_commit() shouldn’t be allowed >> to extend the args->lastbytewritten. This is a metadata operation >> that is owned by the pNFS layout driver. >> The only thing I’d note is you should then rewrite the failure case >> in pnfs_layoutcommit_inode() so that it doesn’t rely on the saved >> “end_pos”, but uses args->lastbytewritten instead (with a comment >> to the effect why)… > > In fact, given the potential for races here, I think the right thing > to do is to have ext_tree_prepare_commit() always set the correct > value for args->lastbytewritten. OK, that has cleared up that common failure case that was getting in the way, but now it can still fail like this: nfs_writeback_update_inode sets size 4096 w/ NFS_INO_INVALID_ATTR set, and sets NFS_INO_LAYOUTCOMMIT 1st nfs_getattr -> pnfs_layoutcommit_inode starts, clears layoutcommit flag sets NFS_INO_LAYOUTCOMMITING nfs_writeback_update_inode sets size 8192 w/ NFS_INO_INVALID_ATTR set, and sets NFS_INO_LAYOUTCOMMIT 1st nfs_getattr -> nfs4_layoutcommit_release sets size 4096, NFS_INO_INVALID_ATTR set, clears NFS_INO_LAYOUTCOMMITTING 1st nfs_getattr -> __revalidate_inode sets size 4096, NFS_INO_INVALID_ATTR not set.. cache is valid 2nd nfs_getattr immediately returns 4096 even though NFS_INO_LAYOUTCOMMIT Ben ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-27 16:14 ` Benjamin Coddington @ 2016-07-27 18:05 ` Trond Myklebust 2016-07-28 9:47 ` Benjamin Coddington 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-27 18:05 UTC (permalink / raw) To: Coddington Benjamin; +Cc: hch, List Linux NFS Mailing DQo+IE9uIEp1bCAyNywgMjAxNiwgYXQgMTI6MTQsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2Rk aW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPiANCj4gT24gMjcgSnVsIDIwMTYsIGF0IDg6MzEsIFRy b25kIE15a2xlYnVzdCB3cm90ZToNCj4gDQo+Pj4gT24gSnVsIDI3LCAyMDE2LCBhdCAwODoxNSwg VHJvbmQgTXlrbGVidXN0IDx0cm9uZG15QHByaW1hcnlkYXRhLmNvbT4gd3JvdGU6DQo+Pj4gDQo+ Pj4gDQo+Pj4+IE9uIEp1bCAyNywgMjAxNiwgYXQgMDc6NTUsIEJlbmphbWluIENvZGRpbmd0b24g PGJjb2RkaW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPj4+PiANCj4+Pj4gQWZ0ZXIgYWRkaW5nIG1v cmUgZGVidWdnaW5nLCBJIHNlZSB0aGF0IGFsbCBvZiB0aGF0IGlzIHdvcmtpbmcgY29ycmVjdGx5 LA0KPj4+PiBidXQgdGhlIGZpcnN0IExBWU9VVENPTU1JVCBpcyB0YWtpbmcgdGhlIHNpemUgYmFj ayBkb3duIHRvIDQwOTYgZnJvbSB0aGUNCj4+Pj4gbGFzdCBuZnNfd3JpdGViYWNrX2RvbmUoKSwg YW5kIHRoZSBzZWNvbmQgTEFZT1VUQ09NTUlUIG5ldmVyIGJyaW5ncyBpdCBiYWNrDQo+Pj4+IHVw IGFnYWluLg0KPj4+PiANCj4+PiANCj4+PiBFeGNlbGxlbnQhIFRoYW5rcyBmb3IgZGVidWdnaW5n IHRoYXQuDQo+Pj4gDQo+Pj4+IE5vdyBJIHNlZSB0aGF0IHdlIHNob3VsZCBiZSBtYXJraW5nIHRo ZSBibG9jayBleHRlbnRzIGFzIHdyaXR0ZW4gYXRvbWljYWxseSB3aXRoDQo+Pj4+IHNldHRpbmcg TEFZT1VUQ09NTUlUIGFuZCBuZnNpLT5sYXlvdXQtPnBsaF9sd2IsIG90aGVyd2lzZSBhIExBWU9V VENPTU1JVCBjYW4NCj4+Pj4gY29sbGVjdCBleHRlbnRzIGp1c3QgYWRkZWQgZnJvbSB0aGUgbmV4 dCBibF93cml0ZV9jbGVhbnVwKCkuICBUaGVuLCB0aGUgbmV4dA0KPj4+PiBMQVlPVVRDT01NSVQg ZmFpbHMsIGFuZCBhbGwgd2UncmUgbGVmdCB3aXRoIGlzIHRoZSBzaXplIGZyb20gdGhlIGZpcnN0 DQo+Pj4+IExBWU9VVENPTU1JVC4gIE5vdCBzdXJlIGlmIHRoYXQgcGFydGljdWxhciBwcm9ibGVt IGlzIHRoZSB3aG9sZSBmaXgsIGJ1dA0KPj4+PiB0aGF0J3Mgc29tZXRoaW5nIHRvIHdvcmsgb24u DQo+Pj4+IA0KPj4+PiBJIHNlZSB3YXlzIHRvIGZpeCB0aGF0Og0KPj4+PiANCj4+Pj4gIC0gbWFr ZSBhIG5ldyBwbmZzX3NldF9sYXlvdXRjb21taXRfbG9ja2VkKCkgdGhhdCBjYW4gYmUgdXNlZCB0 byBjYWxsDQo+Pj4+ICAgIGV4dF90cmVlX21hcmtfd3JpdHRlbigpIGluc2lkZSB0aGUgaV9sb2Nr DQo+Pj4+IA0KPj4+PiAgLSBtYWtlIGFub3RoZXIgcG5mc19sYXlvdXRkcml2ZXJfdHlwZSBvcGVy YXRpb24gdG8gYmUgdXNlZCB3aXRoaW4NCj4+Pj4gICAgcG5mc19zZXRfbGF5b3V0Y29tbWl0ICht YXJrX2xheW91dGNvbW1pdD8gc2V0X2xheW91dGNvbW1pdD8pLCBhbmQgY2FsbA0KPj4+PiAgICBl eHRfdHJlZV9tYXJrX3dyaXR0ZW4oKSB3aXRoaW4gdGhhdC4uDQo+Pj4+IA0KPj4+PiAgLSBoYXZl IC5wcmVwYXJlX2xheW91dGNvbW1pdCByZXR1cm4gYSBuZXcgcG9zaXRpdmUgcGxoX2x3YiB0aGF0 IHdvdWxkDQo+Pj4+ICAgIGV4dGVuZCB0aGUgY3VycmVudCBMQVlPVVRDT01NSVQNCj4+Pj4gDQo+ Pj4+ICAtIG1ha2UgZXh0X3RyZWVfcHJlcGFyZV9jb21taXQgb25seSBlbmNvZGUgdXAgdG8gcGxo X2x3Yg0KPj4+IA0KPj4+IEkgc2VlIG5vIHJlYXNvbiB3aHkgZXh0X3RyZWVfcHJlcGFyZV9jb21t aXQoKSBzaG91bGRu4oCZdCBiZSBhbGxvd2VkIHRvIGV4dGVuZCB0aGUgYXJncy0+bGFzdGJ5dGV3 cml0dGVuLiBUaGlzIGlzIGEgbWV0YWRhdGEgb3BlcmF0aW9uIHRoYXQgaXMgb3duZWQgYnkgdGhl IHBORlMgbGF5b3V0IGRyaXZlci4NCj4+PiBUaGUgb25seSB0aGluZyBJ4oCZZCBub3RlIGlzIHlv dSBzaG91bGQgdGhlbiByZXdyaXRlIHRoZSBmYWlsdXJlIGNhc2UgaW4gcG5mc19sYXlvdXRjb21t aXRfaW5vZGUoKSBzbyB0aGF0IGl0IGRvZXNu4oCZdCByZWx5IG9uIHRoZSBzYXZlZCDigJxlbmRf cG9z4oCdLCBidXQgdXNlcyBhcmdzLT5sYXN0Ynl0ZXdyaXR0ZW4gaW5zdGVhZCAod2l0aCBhIGNv bW1lbnQgdG8gdGhlIGVmZmVjdCB3aHkp4oCmDQo+PiANCj4+IEluIGZhY3QsIGdpdmVuIHRoZSBw b3RlbnRpYWwgZm9yIHJhY2VzIGhlcmUsIEkgdGhpbmsgdGhlIHJpZ2h0IHRoaW5nIHRvIGRvIGlz IHRvIGhhdmUgZXh0X3RyZWVfcHJlcGFyZV9jb21taXQoKSBhbHdheXMgc2V0IHRoZSBjb3JyZWN0 IHZhbHVlIGZvciBhcmdzLT5sYXN0Ynl0ZXdyaXR0ZW4uDQo+IA0KPiBPSywgdGhhdCBoYXMgY2xl YXJlZCB1cCB0aGF0IGNvbW1vbiBmYWlsdXJlIGNhc2UgdGhhdCB3YXMgZ2V0dGluZyBpbiB0aGUN Cj4gd2F5LCBidXQgbm93IGl0IGNhbiBzdGlsbCBmYWlsIGxpa2UgdGhpczoNCj4gDQoNCkdvb2Qg cHJvZ3Jlc3MhIDotKQ0KDQo+IG5mc193cml0ZWJhY2tfdXBkYXRlX2lub2RlIHNldHMgc2l6ZSA0 MDk2IHcvIE5GU19JTk9fSU5WQUxJRF9BVFRSIHNldCwgYW5kIHNldHMgTkZTX0lOT19MQVlPVVRD T01NSVQNCj4gMXN0IG5mc19nZXRhdHRyIC0+IHBuZnNfbGF5b3V0Y29tbWl0X2lub2RlIHN0YXJ0 cywgY2xlYXJzIGxheW91dGNvbW1pdCBmbGFnIHNldHMgTkZTX0lOT19MQVlPVVRDT01NSVRJTkcN Cj4gbmZzX3dyaXRlYmFja191cGRhdGVfaW5vZGUgc2V0cyBzaXplIDgxOTIgdy8gTkZTX0lOT19J TlZBTElEX0FUVFIgc2V0LCBhbmQgc2V0cyBORlNfSU5PX0xBWU9VVENPTU1JVA0KPiAxc3QgbmZz X2dldGF0dHIgLT4gbmZzNF9sYXlvdXRjb21taXRfcmVsZWFzZSBzZXRzIHNpemUgNDA5NiwgTkZT X0lOT19JTlZBTElEX0FUVFIgc2V0LCBjbGVhcnMgTkZTX0lOT19MQVlPVVRDT01NSVRUSU5HDQo+ IDFzdCBuZnNfZ2V0YXR0ciAtPiBfX3JldmFsaWRhdGVfaW5vZGUgc2V0cyBzaXplIDQwOTYsIE5G U19JTk9fSU5WQUxJRF9BVFRSIG5vdCBzZXQuLiBjYWNoZSBpcyB2YWxpZA0KPiAybmQgbmZzX2dl dGF0dHIgaW1tZWRpYXRlbHkgcmV0dXJucyA0MDk2IGV2ZW4gdGhvdWdoIE5GU19JTk9fTEFZT1VU Q09NTUlUDQoNCklzIHRoaXMgYmVpbmcgdGVzdGVkIG9uIHRvcCBvZiB0aGUgY3VycmVudCBsaW51 eC1uZXh0L3Rlc3Rpbmc/IE5vcm1hbGx5LCBJ4oCZZCBleHBlY3QgaHR0cDovL2dpdC5saW51eC1u ZnMub3JnLz9wPXRyb25kbXkvbGludXgtbmZzLmdpdDthPWNvbW1pdGRpZmY7aD0xMGI3ZTlhZDQ0 ODgxZmNkNDZhYzI0ZWI3Mzc0Mzc3YzZlODk2MmVkIHRvIGNhdXNlIDFzdCBuZnNfZ2V0YXR0cigp IHRvIG5vdCBkZWNsYXJlIHRoZSBjYWNoZSB2YWxpZC4NCg0KQ2hlZXJzDQogIFRyb25k ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-27 18:05 ` Trond Myklebust @ 2016-07-28 9:47 ` Benjamin Coddington 2016-07-28 12:31 ` Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Benjamin Coddington @ 2016-07-28 9:47 UTC (permalink / raw) To: Trond Myklebust; +Cc: hch, List Linux NFS Mailing On 27 Jul 2016, at 14:05, Trond Myklebust wrote: >> On Jul 27, 2016, at 12:14, Benjamin Coddington <bcodding@redhat.com> >> wrote: >> >> On 27 Jul 2016, at 8:31, Trond Myklebust wrote: >> >>>> On Jul 27, 2016, at 08:15, Trond Myklebust >>>> <trondmy@primarydata.com> wrote: >>>> >>>> >>>>> On Jul 27, 2016, at 07:55, Benjamin Coddington >>>>> <bcodding@redhat.com> wrote: >>>>> >>>>> After adding more debugging, I see that all of that is working >>>>> correctly, >>>>> but the first LAYOUTCOMMIT is taking the size back down to 4096 >>>>> from the >>>>> last nfs_writeback_done(), and the second LAYOUTCOMMIT never >>>>> brings it back >>>>> up again. >>>>> >>>> >>>> Excellent! Thanks for debugging that. >>>> >>>>> Now I see that we should be marking the block extents as written >>>>> atomically with >>>>> setting LAYOUTCOMMIT and nfsi->layout->plh_lwb, otherwise a >>>>> LAYOUTCOMMIT can >>>>> collect extents just added from the next bl_write_cleanup(). >>>>> Then, the next >>>>> LAYOUTCOMMIT fails, and all we're left with is the size from the >>>>> first >>>>> LAYOUTCOMMIT. Not sure if that particular problem is the whole >>>>> fix, but >>>>> that's something to work on. >>>>> >>>>> I see ways to fix that: >>>>> >>>>> - make a new pnfs_set_layoutcommit_locked() that can be used to >>>>> call >>>>> ext_tree_mark_written() inside the i_lock >>>>> >>>>> - make another pnfs_layoutdriver_type operation to be used within >>>>> pnfs_set_layoutcommit (mark_layoutcommit? set_layoutcommit?), >>>>> and call >>>>> ext_tree_mark_written() within that.. >>>>> >>>>> - have .prepare_layoutcommit return a new positive plh_lwb that >>>>> would >>>>> extend the current LAYOUTCOMMIT >>>>> >>>>> - make ext_tree_prepare_commit only encode up to plh_lwb >>>> >>>> I see no reason why ext_tree_prepare_commit() shouldn’t be >>>> allowed to extend the args->lastbytewritten. This is a metadata >>>> operation that is owned by the pNFS layout driver. >>>> The only thing I’d note is you should then rewrite the failure >>>> case in pnfs_layoutcommit_inode() so that it doesn’t rely on the >>>> saved “end_pos”, but uses args->lastbytewritten instead (with a >>>> comment to the effect why)… >>> >>> In fact, given the potential for races here, I think the right thing >>> to do is to have ext_tree_prepare_commit() always set the correct >>> value for args->lastbytewritten. >> >> OK, that has cleared up that common failure case that was getting in >> the >> way, but now it can still fail like this: >> > > Good progress! :-) > >> nfs_writeback_update_inode sets size 4096 w/ NFS_INO_INVALID_ATTR >> set, and sets NFS_INO_LAYOUTCOMMIT >> 1st nfs_getattr -> pnfs_layoutcommit_inode starts, clears >> layoutcommit flag sets NFS_INO_LAYOUTCOMMITING >> nfs_writeback_update_inode sets size 8192 w/ NFS_INO_INVALID_ATTR >> set, and sets NFS_INO_LAYOUTCOMMIT >> 1st nfs_getattr -> nfs4_layoutcommit_release sets size 4096, >> NFS_INO_INVALID_ATTR set, clears NFS_INO_LAYOUTCOMMITTING >> 1st nfs_getattr -> __revalidate_inode sets size 4096, >> NFS_INO_INVALID_ATTR not set.. cache is valid >> 2nd nfs_getattr immediately returns 4096 even though >> NFS_INO_LAYOUTCOMMIT > > Is this being tested on top of the current linux-next/testing? > Normally, I’d expect > http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=10b7e9ad44881fcd46ac24eb7374377c6e8962ed > to cause 1st nfs_getattr() to not declare the cache valid. Yes, this is on your linux-next branch. When the 1st nfs_getattr() goes through nfs_update_inode() the second time (during __revalidate_inode), NFS_INO_INVALID_ATTR is never set by anything, since all the attributes returned match the cache. So even though NFS_INO_LAYOUTCOMMIT is set, and the cache_validity variable is "false", the NFS_INO_INVALID_ATTR is never set in the "invalid" local variable. Should pnfs_layoutcommit_outstanding() always set NFS_INO_INVALID_ATTR? Ben ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-28 9:47 ` Benjamin Coddington @ 2016-07-28 12:31 ` Trond Myklebust 2016-07-28 14:04 ` Trond Myklebust 2016-07-28 15:33 ` Benjamin Coddington 0 siblings, 2 replies; 69+ messages in thread From: Trond Myklebust @ 2016-07-28 12:31 UTC (permalink / raw) To: Coddington Benjamin; +Cc: hch, List Linux NFS Mailing DQo+IE9uIEp1bCAyOCwgMjAxNiwgYXQgMDU6NDcsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2Rk aW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPiANCj4gDQo+IE9uIDI3IEp1bCAyMDE2LCBhdCAxNDow NSwgVHJvbmQgTXlrbGVidXN0IHdyb3RlOg0KPiANCj4+PiBPbiBKdWwgMjcsIDIwMTYsIGF0IDEy OjE0LCBCZW5qYW1pbiBDb2RkaW5ndG9uIDxiY29kZGluZ0ByZWRoYXQuY29tPiB3cm90ZToNCj4+ PiANCj4+PiBPbiAyNyBKdWwgMjAxNiwgYXQgODozMSwgVHJvbmQgTXlrbGVidXN0IHdyb3RlOg0K Pj4+IA0KPj4+Pj4gT24gSnVsIDI3LCAyMDE2LCBhdCAwODoxNSwgVHJvbmQgTXlrbGVidXN0IDx0 cm9uZG15QHByaW1hcnlkYXRhLmNvbT4gd3JvdGU6DQo+Pj4+PiANCj4+Pj4+IA0KPj4+Pj4+IE9u IEp1bCAyNywgMjAxNiwgYXQgMDc6NTUsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2RkaW5nQHJl ZGhhdC5jb20+IHdyb3RlOg0KPj4+Pj4+IA0KPj4+Pj4+IEFmdGVyIGFkZGluZyBtb3JlIGRlYnVn Z2luZywgSSBzZWUgdGhhdCBhbGwgb2YgdGhhdCBpcyB3b3JraW5nIGNvcnJlY3RseSwNCj4+Pj4+ PiBidXQgdGhlIGZpcnN0IExBWU9VVENPTU1JVCBpcyB0YWtpbmcgdGhlIHNpemUgYmFjayBkb3du IHRvIDQwOTYgZnJvbSB0aGUNCj4+Pj4+PiBsYXN0IG5mc193cml0ZWJhY2tfZG9uZSgpLCBhbmQg dGhlIHNlY29uZCBMQVlPVVRDT01NSVQgbmV2ZXIgYnJpbmdzIGl0IGJhY2sNCj4+Pj4+PiB1cCBh Z2Fpbi4NCj4+Pj4+PiANCj4+Pj4+IA0KPj4+Pj4gRXhjZWxsZW50ISBUaGFua3MgZm9yIGRlYnVn Z2luZyB0aGF0Lg0KPj4+Pj4gDQo+Pj4+Pj4gTm93IEkgc2VlIHRoYXQgd2Ugc2hvdWxkIGJlIG1h cmtpbmcgdGhlIGJsb2NrIGV4dGVudHMgYXMgd3JpdHRlbiBhdG9taWNhbGx5IHdpdGgNCj4+Pj4+ PiBzZXR0aW5nIExBWU9VVENPTU1JVCBhbmQgbmZzaS0+bGF5b3V0LT5wbGhfbHdiLCBvdGhlcndp c2UgYSBMQVlPVVRDT01NSVQgY2FuDQo+Pj4+Pj4gY29sbGVjdCBleHRlbnRzIGp1c3QgYWRkZWQg ZnJvbSB0aGUgbmV4dCBibF93cml0ZV9jbGVhbnVwKCkuICBUaGVuLCB0aGUgbmV4dA0KPj4+Pj4+ IExBWU9VVENPTU1JVCBmYWlscywgYW5kIGFsbCB3ZSdyZSBsZWZ0IHdpdGggaXMgdGhlIHNpemUg ZnJvbSB0aGUgZmlyc3QNCj4+Pj4+PiBMQVlPVVRDT01NSVQuICBOb3Qgc3VyZSBpZiB0aGF0IHBh cnRpY3VsYXIgcHJvYmxlbSBpcyB0aGUgd2hvbGUgZml4LCBidXQNCj4+Pj4+PiB0aGF0J3Mgc29t ZXRoaW5nIHRvIHdvcmsgb24uDQo+Pj4+Pj4gDQo+Pj4+Pj4gSSBzZWUgd2F5cyB0byBmaXggdGhh dDoNCj4+Pj4+PiANCj4+Pj4+PiAtIG1ha2UgYSBuZXcgcG5mc19zZXRfbGF5b3V0Y29tbWl0X2xv Y2tlZCgpIHRoYXQgY2FuIGJlIHVzZWQgdG8gY2FsbA0KPj4+Pj4+ICAgZXh0X3RyZWVfbWFya193 cml0dGVuKCkgaW5zaWRlIHRoZSBpX2xvY2sNCj4+Pj4+PiANCj4+Pj4+PiAtIG1ha2UgYW5vdGhl ciBwbmZzX2xheW91dGRyaXZlcl90eXBlIG9wZXJhdGlvbiB0byBiZSB1c2VkIHdpdGhpbg0KPj4+ Pj4+ICAgcG5mc19zZXRfbGF5b3V0Y29tbWl0IChtYXJrX2xheW91dGNvbW1pdD8gc2V0X2xheW91 dGNvbW1pdD8pLCBhbmQgY2FsbA0KPj4+Pj4+ICAgZXh0X3RyZWVfbWFya193cml0dGVuKCkgd2l0 aGluIHRoYXQuLg0KPj4+Pj4+IA0KPj4+Pj4+IC0gaGF2ZSAucHJlcGFyZV9sYXlvdXRjb21taXQg cmV0dXJuIGEgbmV3IHBvc2l0aXZlIHBsaF9sd2IgdGhhdCB3b3VsZA0KPj4+Pj4+ICAgZXh0ZW5k IHRoZSBjdXJyZW50IExBWU9VVENPTU1JVA0KPj4+Pj4+IA0KPj4+Pj4+IC0gbWFrZSBleHRfdHJl ZV9wcmVwYXJlX2NvbW1pdCBvbmx5IGVuY29kZSB1cCB0byBwbGhfbHdiDQo+Pj4+PiANCj4+Pj4+ IEkgc2VlIG5vIHJlYXNvbiB3aHkgZXh0X3RyZWVfcHJlcGFyZV9jb21taXQoKSBzaG91bGRu4oCZ dCBiZSBhbGxvd2VkIHRvIGV4dGVuZCB0aGUgYXJncy0+bGFzdGJ5dGV3cml0dGVuLiBUaGlzIGlz IGEgbWV0YWRhdGEgb3BlcmF0aW9uIHRoYXQgaXMgb3duZWQgYnkgdGhlIHBORlMgbGF5b3V0IGRy aXZlci4NCj4+Pj4+IFRoZSBvbmx5IHRoaW5nIEnigJlkIG5vdGUgaXMgeW91IHNob3VsZCB0aGVu IHJld3JpdGUgdGhlIGZhaWx1cmUgY2FzZSBpbiBwbmZzX2xheW91dGNvbW1pdF9pbm9kZSgpIHNv IHRoYXQgaXQgZG9lc27igJl0IHJlbHkgb24gdGhlIHNhdmVkIOKAnGVuZF9wb3PigJ0sIGJ1dCB1 c2VzIGFyZ3MtPmxhc3RieXRld3JpdHRlbiBpbnN0ZWFkICh3aXRoIGEgY29tbWVudCB0byB0aGUg ZWZmZWN0IHdoeSnigKYNCj4+Pj4gDQo+Pj4+IEluIGZhY3QsIGdpdmVuIHRoZSBwb3RlbnRpYWwg Zm9yIHJhY2VzIGhlcmUsIEkgdGhpbmsgdGhlIHJpZ2h0IHRoaW5nIHRvIGRvIGlzIHRvIGhhdmUg ZXh0X3RyZWVfcHJlcGFyZV9jb21taXQoKSBhbHdheXMgc2V0IHRoZSBjb3JyZWN0IHZhbHVlIGZv ciBhcmdzLT5sYXN0Ynl0ZXdyaXR0ZW4uDQo+Pj4gDQo+Pj4gT0ssIHRoYXQgaGFzIGNsZWFyZWQg dXAgdGhhdCBjb21tb24gZmFpbHVyZSBjYXNlIHRoYXQgd2FzIGdldHRpbmcgaW4gdGhlDQo+Pj4g d2F5LCBidXQgbm93IGl0IGNhbiBzdGlsbCBmYWlsIGxpa2UgdGhpczoNCj4+PiANCj4+IA0KPj4g R29vZCBwcm9ncmVzcyEgOi0pDQo+PiANCj4+PiBuZnNfd3JpdGViYWNrX3VwZGF0ZV9pbm9kZSBz ZXRzIHNpemUgNDA5NiB3LyBORlNfSU5PX0lOVkFMSURfQVRUUiBzZXQsIGFuZCBzZXRzIE5GU19J Tk9fTEFZT1VUQ09NTUlUDQo+Pj4gMXN0IG5mc19nZXRhdHRyIC0+IHBuZnNfbGF5b3V0Y29tbWl0 X2lub2RlIHN0YXJ0cywgY2xlYXJzIGxheW91dGNvbW1pdCBmbGFnIHNldHMgTkZTX0lOT19MQVlP VVRDT01NSVRJTkcNCj4+PiBuZnNfd3JpdGViYWNrX3VwZGF0ZV9pbm9kZSBzZXRzIHNpemUgODE5 MiB3LyBORlNfSU5PX0lOVkFMSURfQVRUUiBzZXQsIGFuZCBzZXRzIE5GU19JTk9fTEFZT1VUQ09N TUlUDQo+Pj4gMXN0IG5mc19nZXRhdHRyIC0+IG5mczRfbGF5b3V0Y29tbWl0X3JlbGVhc2Ugc2V0 cyBzaXplIDQwOTYsIE5GU19JTk9fSU5WQUxJRF9BVFRSIHNldCwgY2xlYXJzIE5GU19JTk9fTEFZ T1VUQ09NTUlUVElORw0KPj4+IDFzdCBuZnNfZ2V0YXR0ciAtPiBfX3JldmFsaWRhdGVfaW5vZGUg c2V0cyBzaXplIDQwOTYsIE5GU19JTk9fSU5WQUxJRF9BVFRSIG5vdCBzZXQuLiBjYWNoZSBpcyB2 YWxpZA0KPj4+IDJuZCBuZnNfZ2V0YXR0ciBpbW1lZGlhdGVseSByZXR1cm5zIDQwOTYgZXZlbiB0 aG91Z2ggTkZTX0lOT19MQVlPVVRDT01NSVQNCj4+IA0KPj4gSXMgdGhpcyBiZWluZyB0ZXN0ZWQg b24gdG9wIG9mIHRoZSBjdXJyZW50IGxpbnV4LW5leHQvdGVzdGluZz8gTm9ybWFsbHksIEnigJlk IGV4cGVjdCBodHRwOi8vZ2l0LmxpbnV4LW5mcy5vcmcvP3A9dHJvbmRteS9saW51eC1uZnMuZ2l0 O2E9Y29tbWl0ZGlmZjtoPTEwYjdlOWFkNDQ4ODFmY2Q0NmFjMjRlYjczNzQzNzdjNmU4OTYyZWQg dG8gY2F1c2UgMXN0IG5mc19nZXRhdHRyKCkgdG8gbm90IGRlY2xhcmUgdGhlIGNhY2hlIHZhbGlk Lg0KPiANCj4gWWVzLCB0aGlzIGlzIG9uIHlvdXIgbGludXgtbmV4dCBicmFuY2guDQo+IA0KPiBX aGVuIHRoZSAxc3QgbmZzX2dldGF0dHIoKSBnb2VzIHRocm91Z2ggbmZzX3VwZGF0ZV9pbm9kZSgp IHRoZSBzZWNvbmQgdGltZQ0KPiAoZHVyaW5nIF9fcmV2YWxpZGF0ZV9pbm9kZSksIE5GU19JTk9f SU5WQUxJRF9BVFRSIGlzIG5ldmVyIHNldCBieSBhbnl0aGluZywNCj4gc2luY2UgYWxsIHRoZSBh dHRyaWJ1dGVzIHJldHVybmVkIG1hdGNoIHRoZSBjYWNoZS4gIFNvIGV2ZW4gdGhvdWdoDQo+IE5G U19JTk9fTEFZT1VUQ09NTUlUIGlzIHNldCwgYW5kIHRoZSBjYWNoZV92YWxpZGl0eSB2YXJpYWJs ZSBpcyAiZmFsc2UiLA0KPiB0aGUgTkZTX0lOT19JTlZBTElEX0FUVFIgaXMgbmV2ZXIgc2V0IGlu IHRoZSAiaW52YWxpZCIgbG9jYWwgdmFyaWFibGUuDQo+IA0KPiBTaG91bGQgcG5mc19sYXlvdXRj b21taXRfb3V0c3RhbmRpbmcoKSBhbHdheXMgc2V0IE5GU19JTk9fSU5WQUxJRF9BVFRSPw0KPiAN Cj4gQmVuDQoNCm5mc19wb3N0X29wX3VwZGF0ZV9pbm9kZV9sb2NrZWQoKSBzaG91bGQgYmUgZG9p bmcgdGhhdCBhcyBwYXJ0IG9mIHRoZSBjYWxsY2hhaW4gaW4gbmZzX3dyaXRlYmFja191cGRhdGVf aW5vZGUoKS4NCg0KDQo= ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-28 12:31 ` Trond Myklebust @ 2016-07-28 14:04 ` Trond Myklebust 2016-07-28 15:38 ` Benjamin Coddington 2016-07-28 15:33 ` Benjamin Coddington 1 sibling, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-28 14:04 UTC (permalink / raw) To: Coddington Benjamin; +Cc: hch, List Linux NFS Mailing DQo+IE9uIEp1bCAyOCwgMjAxNiwgYXQgMDg6MzEsIFRyb25kIE15a2xlYnVzdCA8dHJvbmRteUBw cmltYXJ5ZGF0YS5jb20+IHdyb3RlOg0KPiANCj4gDQo+PiBPbiBKdWwgMjgsIDIwMTYsIGF0IDA1 OjQ3LCBCZW5qYW1pbiBDb2RkaW5ndG9uIDxiY29kZGluZ0ByZWRoYXQuY29tPiB3cm90ZToNCj4+ IA0KPj4gDQo+PiBPbiAyNyBKdWwgMjAxNiwgYXQgMTQ6MDUsIFRyb25kIE15a2xlYnVzdCB3cm90 ZToNCj4+IA0KPj4+PiBPbiBKdWwgMjcsIDIwMTYsIGF0IDEyOjE0LCBCZW5qYW1pbiBDb2RkaW5n dG9uIDxiY29kZGluZ0ByZWRoYXQuY29tPiB3cm90ZToNCj4+Pj4gDQo+Pj4+IE9uIDI3IEp1bCAy MDE2LCBhdCA4OjMxLCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6DQo+Pj4+IA0KPj4+Pj4+IE9uIEp1 bCAyNywgMjAxNiwgYXQgMDg6MTUsIFRyb25kIE15a2xlYnVzdCA8dHJvbmRteUBwcmltYXJ5ZGF0 YS5jb20+IHdyb3RlOg0KPj4+Pj4+IA0KPj4+Pj4+IA0KPj4+Pj4+PiBPbiBKdWwgMjcsIDIwMTYs IGF0IDA3OjU1LCBCZW5qYW1pbiBDb2RkaW5ndG9uIDxiY29kZGluZ0ByZWRoYXQuY29tPiB3cm90 ZToNCj4+Pj4+Pj4gDQo+Pj4+Pj4+IEFmdGVyIGFkZGluZyBtb3JlIGRlYnVnZ2luZywgSSBzZWUg dGhhdCBhbGwgb2YgdGhhdCBpcyB3b3JraW5nIGNvcnJlY3RseSwNCj4+Pj4+Pj4gYnV0IHRoZSBm aXJzdCBMQVlPVVRDT01NSVQgaXMgdGFraW5nIHRoZSBzaXplIGJhY2sgZG93biB0byA0MDk2IGZy b20gdGhlDQo+Pj4+Pj4+IGxhc3QgbmZzX3dyaXRlYmFja19kb25lKCksIGFuZCB0aGUgc2Vjb25k IExBWU9VVENPTU1JVCBuZXZlciBicmluZ3MgaXQgYmFjaw0KPj4+Pj4+PiB1cCBhZ2Fpbi4NCj4+ Pj4+Pj4gDQo+Pj4+Pj4gDQo+Pj4+Pj4gRXhjZWxsZW50ISBUaGFua3MgZm9yIGRlYnVnZ2luZyB0 aGF0Lg0KPj4+Pj4+IA0KPj4+Pj4+PiBOb3cgSSBzZWUgdGhhdCB3ZSBzaG91bGQgYmUgbWFya2lu ZyB0aGUgYmxvY2sgZXh0ZW50cyBhcyB3cml0dGVuIGF0b21pY2FsbHkgd2l0aA0KPj4+Pj4+PiBz ZXR0aW5nIExBWU9VVENPTU1JVCBhbmQgbmZzaS0+bGF5b3V0LT5wbGhfbHdiLCBvdGhlcndpc2Ug YSBMQVlPVVRDT01NSVQgY2FuDQo+Pj4+Pj4+IGNvbGxlY3QgZXh0ZW50cyBqdXN0IGFkZGVkIGZy b20gdGhlIG5leHQgYmxfd3JpdGVfY2xlYW51cCgpLiAgVGhlbiwgdGhlIG5leHQNCj4+Pj4+Pj4g TEFZT1VUQ09NTUlUIGZhaWxzLCBhbmQgYWxsIHdlJ3JlIGxlZnQgd2l0aCBpcyB0aGUgc2l6ZSBm cm9tIHRoZSBmaXJzdA0KPj4+Pj4+PiBMQVlPVVRDT01NSVQuICBOb3Qgc3VyZSBpZiB0aGF0IHBh cnRpY3VsYXIgcHJvYmxlbSBpcyB0aGUgd2hvbGUgZml4LCBidXQNCj4+Pj4+Pj4gdGhhdCdzIHNv bWV0aGluZyB0byB3b3JrIG9uLg0KPj4+Pj4+PiANCj4+Pj4+Pj4gSSBzZWUgd2F5cyB0byBmaXgg dGhhdDoNCj4+Pj4+Pj4gDQo+Pj4+Pj4+IC0gbWFrZSBhIG5ldyBwbmZzX3NldF9sYXlvdXRjb21t aXRfbG9ja2VkKCkgdGhhdCBjYW4gYmUgdXNlZCB0byBjYWxsDQo+Pj4+Pj4+ICBleHRfdHJlZV9t YXJrX3dyaXR0ZW4oKSBpbnNpZGUgdGhlIGlfbG9jaw0KPj4+Pj4+PiANCj4+Pj4+Pj4gLSBtYWtl IGFub3RoZXIgcG5mc19sYXlvdXRkcml2ZXJfdHlwZSBvcGVyYXRpb24gdG8gYmUgdXNlZCB3aXRo aW4NCj4+Pj4+Pj4gIHBuZnNfc2V0X2xheW91dGNvbW1pdCAobWFya19sYXlvdXRjb21taXQ/IHNl dF9sYXlvdXRjb21taXQ/KSwgYW5kIGNhbGwNCj4+Pj4+Pj4gIGV4dF90cmVlX21hcmtfd3JpdHRl bigpIHdpdGhpbiB0aGF0Li4NCj4+Pj4+Pj4gDQo+Pj4+Pj4+IC0gaGF2ZSAucHJlcGFyZV9sYXlv dXRjb21taXQgcmV0dXJuIGEgbmV3IHBvc2l0aXZlIHBsaF9sd2IgdGhhdCB3b3VsZA0KPj4+Pj4+ PiAgZXh0ZW5kIHRoZSBjdXJyZW50IExBWU9VVENPTU1JVA0KPj4+Pj4+PiANCj4+Pj4+Pj4gLSBt YWtlIGV4dF90cmVlX3ByZXBhcmVfY29tbWl0IG9ubHkgZW5jb2RlIHVwIHRvIHBsaF9sd2INCj4+ Pj4+PiANCj4+Pj4+PiBJIHNlZSBubyByZWFzb24gd2h5IGV4dF90cmVlX3ByZXBhcmVfY29tbWl0 KCkgc2hvdWxkbuKAmXQgYmUgYWxsb3dlZCB0byBleHRlbmQgdGhlIGFyZ3MtPmxhc3RieXRld3Jp dHRlbi4gVGhpcyBpcyBhIG1ldGFkYXRhIG9wZXJhdGlvbiB0aGF0IGlzIG93bmVkIGJ5IHRoZSBw TkZTIGxheW91dCBkcml2ZXIuDQo+Pj4+Pj4gVGhlIG9ubHkgdGhpbmcgSeKAmWQgbm90ZSBpcyB5 b3Ugc2hvdWxkIHRoZW4gcmV3cml0ZSB0aGUgZmFpbHVyZSBjYXNlIGluIHBuZnNfbGF5b3V0Y29t bWl0X2lub2RlKCkgc28gdGhhdCBpdCBkb2VzbuKAmXQgcmVseSBvbiB0aGUgc2F2ZWQg4oCcZW5k X3Bvc+KAnSwgYnV0IHVzZXMgYXJncy0+bGFzdGJ5dGV3cml0dGVuIGluc3RlYWQgKHdpdGggYSBj b21tZW50IHRvIHRoZSBlZmZlY3Qgd2h5KeKApg0KPj4+Pj4gDQo+Pj4+PiBJbiBmYWN0LCBnaXZl biB0aGUgcG90ZW50aWFsIGZvciByYWNlcyBoZXJlLCBJIHRoaW5rIHRoZSByaWdodCB0aGluZyB0 byBkbyBpcyB0byBoYXZlIGV4dF90cmVlX3ByZXBhcmVfY29tbWl0KCkgYWx3YXlzIHNldCB0aGUg Y29ycmVjdCB2YWx1ZSBmb3IgYXJncy0+bGFzdGJ5dGV3cml0dGVuLg0KPj4+PiANCj4+Pj4gT0ss IHRoYXQgaGFzIGNsZWFyZWQgdXAgdGhhdCBjb21tb24gZmFpbHVyZSBjYXNlIHRoYXQgd2FzIGdl dHRpbmcgaW4gdGhlDQo+Pj4+IHdheSwgYnV0IG5vdyBpdCBjYW4gc3RpbGwgZmFpbCBsaWtlIHRo aXM6DQo+Pj4+IA0KPj4+IA0KPj4+IEdvb2QgcHJvZ3Jlc3MhIDotKQ0KPj4+IA0KPj4+PiBuZnNf d3JpdGViYWNrX3VwZGF0ZV9pbm9kZSBzZXRzIHNpemUgNDA5NiB3LyBORlNfSU5PX0lOVkFMSURf QVRUUiBzZXQsIGFuZCBzZXRzIE5GU19JTk9fTEFZT1VUQ09NTUlUDQo+Pj4+IDFzdCBuZnNfZ2V0 YXR0ciAtPiBwbmZzX2xheW91dGNvbW1pdF9pbm9kZSBzdGFydHMsIGNsZWFycyBsYXlvdXRjb21t aXQgZmxhZyBzZXRzIE5GU19JTk9fTEFZT1VUQ09NTUlUSU5HDQo+Pj4+IG5mc193cml0ZWJhY2tf dXBkYXRlX2lub2RlIHNldHMgc2l6ZSA4MTkyIHcvIE5GU19JTk9fSU5WQUxJRF9BVFRSIHNldCwg YW5kIHNldHMgTkZTX0lOT19MQVlPVVRDT01NSVQNCj4+Pj4gMXN0IG5mc19nZXRhdHRyIC0+IG5m czRfbGF5b3V0Y29tbWl0X3JlbGVhc2Ugc2V0cyBzaXplIDQwOTYsIE5GU19JTk9fSU5WQUxJRF9B VFRSIHNldCwgY2xlYXJzIE5GU19JTk9fTEFZT1VUQ09NTUlUVElORw0KPj4+PiAxc3QgbmZzX2dl dGF0dHIgLT4gX19yZXZhbGlkYXRlX2lub2RlIHNldHMgc2l6ZSA0MDk2LCBORlNfSU5PX0lOVkFM SURfQVRUUiBub3Qgc2V0Li4gY2FjaGUgaXMgdmFsaWQNCj4+Pj4gMm5kIG5mc19nZXRhdHRyIGlt bWVkaWF0ZWx5IHJldHVybnMgNDA5NiBldmVuIHRob3VnaCBORlNfSU5PX0xBWU9VVENPTU1JVA0K Pj4+IA0KPj4+IElzIHRoaXMgYmVpbmcgdGVzdGVkIG9uIHRvcCBvZiB0aGUgY3VycmVudCBsaW51 eC1uZXh0L3Rlc3Rpbmc/IE5vcm1hbGx5LCBJ4oCZZCBleHBlY3QgaHR0cDovL2dpdC5saW51eC1u ZnMub3JnLz9wPXRyb25kbXkvbGludXgtbmZzLmdpdDthPWNvbW1pdGRpZmY7aD0xMGI3ZTlhZDQ0 ODgxZmNkNDZhYzI0ZWI3Mzc0Mzc3YzZlODk2MmVkIHRvIGNhdXNlIDFzdCBuZnNfZ2V0YXR0cigp IHRvIG5vdCBkZWNsYXJlIHRoZSBjYWNoZSB2YWxpZC4NCj4+IA0KPj4gWWVzLCB0aGlzIGlzIG9u IHlvdXIgbGludXgtbmV4dCBicmFuY2guDQo+PiANCj4+IFdoZW4gdGhlIDFzdCBuZnNfZ2V0YXR0 cigpIGdvZXMgdGhyb3VnaCBuZnNfdXBkYXRlX2lub2RlKCkgdGhlIHNlY29uZCB0aW1lDQo+PiAo ZHVyaW5nIF9fcmV2YWxpZGF0ZV9pbm9kZSksIE5GU19JTk9fSU5WQUxJRF9BVFRSIGlzIG5ldmVy IHNldCBieSBhbnl0aGluZywNCj4+IHNpbmNlIGFsbCB0aGUgYXR0cmlidXRlcyByZXR1cm5lZCBt YXRjaCB0aGUgY2FjaGUuICBTbyBldmVuIHRob3VnaA0KPj4gTkZTX0lOT19MQVlPVVRDT01NSVQg aXMgc2V0LCBhbmQgdGhlIGNhY2hlX3ZhbGlkaXR5IHZhcmlhYmxlIGlzICJmYWxzZSIsDQo+PiB0 aGUgTkZTX0lOT19JTlZBTElEX0FUVFIgaXMgbmV2ZXIgc2V0IGluIHRoZSAiaW52YWxpZCIgbG9j YWwgdmFyaWFibGUuDQo+PiANCj4+IFNob3VsZCBwbmZzX2xheW91dGNvbW1pdF9vdXRzdGFuZGlu ZygpIGFsd2F5cyBzZXQgTkZTX0lOT19JTlZBTElEX0FUVFI/DQo+PiANCj4+IEJlbg0KPiANCj4g bmZzX3Bvc3Rfb3BfdXBkYXRlX2lub2RlX2xvY2tlZCgpIHNob3VsZCBiZSBkb2luZyB0aGF0IGFz IHBhcnQgb2YgdGhlIGNhbGxjaGFpbiBpbiBuZnNfd3JpdGViYWNrX3VwZGF0ZV9pbm9kZSgpLg0K PiANCg0KQnkgdGhlIHdheS4gSSBqdXN0IG5vdGljZWQgdGhhdCBub3RoaW5nIGFwcGVhcnMgdG8g YmUgdXNpbmcgdGhlIGF0dHJpYnV0ZXMgd2UgcmV0cmlldmUgYXMgcGFydCBvZiB0aGUgbGF5b3V0 Y29tbWl0IGNhbGwuIERvZXMgYWRkaW5nIGEgbmZzX3JlZnJlc2hfaW5vZGUoKSB0byB0aGUg4oCc c3VjY2Vzc+KAnSBwYXRoIGluIG5mczRfbGF5b3V0Y29tbWl0X2RvbmUoKSBwZXJoYXBzIGhlbHA/ DQoNCg0K ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-28 14:04 ` Trond Myklebust @ 2016-07-28 15:38 ` Benjamin Coddington 2016-07-28 15:39 ` Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Benjamin Coddington @ 2016-07-28 15:38 UTC (permalink / raw) To: Trond Myklebust; +Cc: hch, List Linux NFS Mailing On 28 Jul 2016, at 10:04, Trond Myklebust wrote: >> On Jul 28, 2016, at 08:31, Trond Myklebust <trondmy@primarydata.com> >> wrote: >> >> >>> On Jul 28, 2016, at 05:47, Benjamin Coddington <bcodding@redhat.com> >>> wrote: >>> >>> >>> On 27 Jul 2016, at 14:05, Trond Myklebust wrote: >>> >>>>> On Jul 27, 2016, at 12:14, Benjamin Coddington >>>>> <bcodding@redhat.com> wrote: >>>>> >>>>> On 27 Jul 2016, at 8:31, Trond Myklebust wrote: >>>>> >>>>>>> On Jul 27, 2016, at 08:15, Trond Myklebust >>>>>>> <trondmy@primarydata.com> wrote: >>>>>>> >>>>>>> >>>>>>>> On Jul 27, 2016, at 07:55, Benjamin Coddington >>>>>>>> <bcodding@redhat.com> wrote: >>>>>>>> >>>>>>>> After adding more debugging, I see that all of that is working >>>>>>>> correctly, >>>>>>>> but the first LAYOUTCOMMIT is taking the size back down to 4096 >>>>>>>> from the >>>>>>>> last nfs_writeback_done(), and the second LAYOUTCOMMIT never >>>>>>>> brings it back >>>>>>>> up again. >>>>>>>> >>>>>>> >>>>>>> Excellent! Thanks for debugging that. >>>>>>> >>>>>>>> Now I see that we should be marking the block extents as >>>>>>>> written atomically with >>>>>>>> setting LAYOUTCOMMIT and nfsi->layout->plh_lwb, otherwise a >>>>>>>> LAYOUTCOMMIT can >>>>>>>> collect extents just added from the next bl_write_cleanup(). >>>>>>>> Then, the next >>>>>>>> LAYOUTCOMMIT fails, and all we're left with is the size from >>>>>>>> the first >>>>>>>> LAYOUTCOMMIT. Not sure if that particular problem is the whole >>>>>>>> fix, but >>>>>>>> that's something to work on. >>>>>>>> >>>>>>>> I see ways to fix that: >>>>>>>> >>>>>>>> - make a new pnfs_set_layoutcommit_locked() that can be used to >>>>>>>> call >>>>>>>> ext_tree_mark_written() inside the i_lock >>>>>>>> >>>>>>>> - make another pnfs_layoutdriver_type operation to be used >>>>>>>> within >>>>>>>> pnfs_set_layoutcommit (mark_layoutcommit? set_layoutcommit?), >>>>>>>> and call >>>>>>>> ext_tree_mark_written() within that.. >>>>>>>> >>>>>>>> - have .prepare_layoutcommit return a new positive plh_lwb that >>>>>>>> would >>>>>>>> extend the current LAYOUTCOMMIT >>>>>>>> >>>>>>>> - make ext_tree_prepare_commit only encode up to plh_lwb >>>>>>> >>>>>>> I see no reason why ext_tree_prepare_commit() shouldn’t be >>>>>>> allowed to extend the args->lastbytewritten. This is a metadata >>>>>>> operation that is owned by the pNFS layout driver. >>>>>>> The only thing I’d note is you should then rewrite the failure >>>>>>> case in pnfs_layoutcommit_inode() so that it doesn’t rely on >>>>>>> the saved “end_pos”, but uses args->lastbytewritten instead >>>>>>> (with a comment to the effect why)… >>>>>> >>>>>> In fact, given the potential for races here, I think the right >>>>>> thing to do is to have ext_tree_prepare_commit() always set the >>>>>> correct value for args->lastbytewritten. >>>>> >>>>> OK, that has cleared up that common failure case that was getting >>>>> in the >>>>> way, but now it can still fail like this: >>>>> >>>> >>>> Good progress! :-) >>>> >>>>> nfs_writeback_update_inode sets size 4096 w/ NFS_INO_INVALID_ATTR >>>>> set, and sets NFS_INO_LAYOUTCOMMIT >>>>> 1st nfs_getattr -> pnfs_layoutcommit_inode starts, clears >>>>> layoutcommit flag sets NFS_INO_LAYOUTCOMMITING >>>>> nfs_writeback_update_inode sets size 8192 w/ NFS_INO_INVALID_ATTR >>>>> set, and sets NFS_INO_LAYOUTCOMMIT >>>>> 1st nfs_getattr -> nfs4_layoutcommit_release sets size 4096, >>>>> NFS_INO_INVALID_ATTR set, clears NFS_INO_LAYOUTCOMMITTING >>>>> 1st nfs_getattr -> __revalidate_inode sets size 4096, >>>>> NFS_INO_INVALID_ATTR not set.. cache is valid >>>>> 2nd nfs_getattr immediately returns 4096 even though >>>>> NFS_INO_LAYOUTCOMMIT >>>> >>>> Is this being tested on top of the current linux-next/testing? >>>> Normally, I’d expect >>>> http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=10b7e9ad44881fcd46ac24eb7374377c6e8962ed >>>> to cause 1st nfs_getattr() to not declare the cache valid. >>> >>> Yes, this is on your linux-next branch. >>> >>> When the 1st nfs_getattr() goes through nfs_update_inode() the >>> second time >>> (during __revalidate_inode), NFS_INO_INVALID_ATTR is never set by >>> anything, >>> since all the attributes returned match the cache. So even though >>> NFS_INO_LAYOUTCOMMIT is set, and the cache_validity variable is >>> "false", >>> the NFS_INO_INVALID_ATTR is never set in the "invalid" local >>> variable. >>> >>> Should pnfs_layoutcommit_outstanding() always set >>> NFS_INO_INVALID_ATTR? >>> >>> Ben >> >> nfs_post_op_update_inode_locked() should be doing that as part of the >> callchain in nfs_writeback_update_inode(). >> > > By the way. I just noticed that nothing appears to be using the > attributes we retrieve as part of the layoutcommit call. Does adding a > nfs_refresh_inode() to the “success” path in > nfs4_layoutcommit_done() perhaps help? We do it in layoutcommit_release: nfs4_layoutcommit_done [nfsv4]() { ... } nfs4_layoutcommit_release [nfsv4]() { ... nfs_post_op_update_inode_force_wcc [nfs]() { nfs_post_op_update_inode_force_wcc_locked [nfs]() { nfs_post_op_update_inode_locked [nfs]() { nfs4_have_delegation [nfsv4]() { nfs4_do_check_delegation [nfsv4](); } nfs_refresh_inode_locked [nfs]() { nfs_update_inode [nfs]() { Should I still try adding it in nfs4_layoutcommit_done()? Ben ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-28 15:38 ` Benjamin Coddington @ 2016-07-28 15:39 ` Trond Myklebust 0 siblings, 0 replies; 69+ messages in thread From: Trond Myklebust @ 2016-07-28 15:39 UTC (permalink / raw) To: Coddington Benjamin; +Cc: hch, List Linux NFS Mailing DQo+IE9uIEp1bCAyOCwgMjAxNiwgYXQgMTE6MzgsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2Rk aW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPiANCj4gDQo+IA0KPiBPbiAyOCBKdWwgMjAxNiwgYXQg MTA6MDQsIFRyb25kIE15a2xlYnVzdCB3cm90ZToNCj4gDQo+Pj4gT24gSnVsIDI4LCAyMDE2LCBh dCAwODozMSwgVHJvbmQgTXlrbGVidXN0IDx0cm9uZG15QHByaW1hcnlkYXRhLmNvbT4gd3JvdGU6 DQo+Pj4gDQo+Pj4gDQo+Pj4+IE9uIEp1bCAyOCwgMjAxNiwgYXQgMDU6NDcsIEJlbmphbWluIENv ZGRpbmd0b24gPGJjb2RkaW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPj4+PiANCj4+Pj4gDQo+Pj4+ IE9uIDI3IEp1bCAyMDE2LCBhdCAxNDowNSwgVHJvbmQgTXlrbGVidXN0IHdyb3RlOg0KPj4+PiAN Cj4+Pj4+PiBPbiBKdWwgMjcsIDIwMTYsIGF0IDEyOjE0LCBCZW5qYW1pbiBDb2RkaW5ndG9uIDxi Y29kZGluZ0ByZWRoYXQuY29tPiB3cm90ZToNCj4+Pj4+PiANCj4+Pj4+PiBPbiAyNyBKdWwgMjAx NiwgYXQgODozMSwgVHJvbmQgTXlrbGVidXN0IHdyb3RlOg0KPj4+Pj4+IA0KPj4+Pj4+Pj4gT24g SnVsIDI3LCAyMDE2LCBhdCAwODoxNSwgVHJvbmQgTXlrbGVidXN0IDx0cm9uZG15QHByaW1hcnlk YXRhLmNvbT4gd3JvdGU6DQo+Pj4+Pj4+PiANCj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4+IE9uIEp1bCAy NywgMjAxNiwgYXQgMDc6NTUsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2RkaW5nQHJlZGhhdC5j b20+IHdyb3RlOg0KPj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4+IEFmdGVyIGFkZGluZyBtb3JlIGRlYnVn Z2luZywgSSBzZWUgdGhhdCBhbGwgb2YgdGhhdCBpcyB3b3JraW5nIGNvcnJlY3RseSwNCj4+Pj4+ Pj4+PiBidXQgdGhlIGZpcnN0IExBWU9VVENPTU1JVCBpcyB0YWtpbmcgdGhlIHNpemUgYmFjayBk b3duIHRvIDQwOTYgZnJvbSB0aGUNCj4+Pj4+Pj4+PiBsYXN0IG5mc193cml0ZWJhY2tfZG9uZSgp LCBhbmQgdGhlIHNlY29uZCBMQVlPVVRDT01NSVQgbmV2ZXIgYnJpbmdzIGl0IGJhY2sNCj4+Pj4+ Pj4+PiB1cCBhZ2Fpbi4NCj4+Pj4+Pj4+PiANCj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4gRXhjZWxsZW50 ISBUaGFua3MgZm9yIGRlYnVnZ2luZyB0aGF0Lg0KPj4+Pj4+Pj4gDQo+Pj4+Pj4+Pj4gTm93IEkg c2VlIHRoYXQgd2Ugc2hvdWxkIGJlIG1hcmtpbmcgdGhlIGJsb2NrIGV4dGVudHMgYXMgd3JpdHRl biBhdG9taWNhbGx5IHdpdGgNCj4+Pj4+Pj4+PiBzZXR0aW5nIExBWU9VVENPTU1JVCBhbmQgbmZz aS0+bGF5b3V0LT5wbGhfbHdiLCBvdGhlcndpc2UgYSBMQVlPVVRDT01NSVQgY2FuDQo+Pj4+Pj4+ Pj4gY29sbGVjdCBleHRlbnRzIGp1c3QgYWRkZWQgZnJvbSB0aGUgbmV4dCBibF93cml0ZV9jbGVh bnVwKCkuICBUaGVuLCB0aGUgbmV4dA0KPj4+Pj4+Pj4+IExBWU9VVENPTU1JVCBmYWlscywgYW5k IGFsbCB3ZSdyZSBsZWZ0IHdpdGggaXMgdGhlIHNpemUgZnJvbSB0aGUgZmlyc3QNCj4+Pj4+Pj4+ PiBMQVlPVVRDT01NSVQuICBOb3Qgc3VyZSBpZiB0aGF0IHBhcnRpY3VsYXIgcHJvYmxlbSBpcyB0 aGUgd2hvbGUgZml4LCBidXQNCj4+Pj4+Pj4+PiB0aGF0J3Mgc29tZXRoaW5nIHRvIHdvcmsgb24u DQo+Pj4+Pj4+Pj4gDQo+Pj4+Pj4+Pj4gSSBzZWUgd2F5cyB0byBmaXggdGhhdDoNCj4+Pj4+Pj4+ PiANCj4+Pj4+Pj4+PiAtIG1ha2UgYSBuZXcgcG5mc19zZXRfbGF5b3V0Y29tbWl0X2xvY2tlZCgp IHRoYXQgY2FuIGJlIHVzZWQgdG8gY2FsbA0KPj4+Pj4+Pj4+IGV4dF90cmVlX21hcmtfd3JpdHRl bigpIGluc2lkZSB0aGUgaV9sb2NrDQo+Pj4+Pj4+Pj4gDQo+Pj4+Pj4+Pj4gLSBtYWtlIGFub3Ro ZXIgcG5mc19sYXlvdXRkcml2ZXJfdHlwZSBvcGVyYXRpb24gdG8gYmUgdXNlZCB3aXRoaW4NCj4+ Pj4+Pj4+PiBwbmZzX3NldF9sYXlvdXRjb21taXQgKG1hcmtfbGF5b3V0Y29tbWl0PyBzZXRfbGF5 b3V0Y29tbWl0PyksIGFuZCBjYWxsDQo+Pj4+Pj4+Pj4gZXh0X3RyZWVfbWFya193cml0dGVuKCkg d2l0aGluIHRoYXQuLg0KPj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4+IC0gaGF2ZSAucHJlcGFyZV9sYXlv dXRjb21taXQgcmV0dXJuIGEgbmV3IHBvc2l0aXZlIHBsaF9sd2IgdGhhdCB3b3VsZA0KPj4+Pj4+ Pj4+IGV4dGVuZCB0aGUgY3VycmVudCBMQVlPVVRDT01NSVQNCj4+Pj4+Pj4+PiANCj4+Pj4+Pj4+ PiAtIG1ha2UgZXh0X3RyZWVfcHJlcGFyZV9jb21taXQgb25seSBlbmNvZGUgdXAgdG8gcGxoX2x3 Yg0KPj4+Pj4+Pj4gDQo+Pj4+Pj4+PiBJIHNlZSBubyByZWFzb24gd2h5IGV4dF90cmVlX3ByZXBh cmVfY29tbWl0KCkgc2hvdWxkbuKAmXQgYmUgYWxsb3dlZCB0byBleHRlbmQgdGhlIGFyZ3MtPmxh c3RieXRld3JpdHRlbi4gVGhpcyBpcyBhIG1ldGFkYXRhIG9wZXJhdGlvbiB0aGF0IGlzIG93bmVk IGJ5IHRoZSBwTkZTIGxheW91dCBkcml2ZXIuDQo+Pj4+Pj4+PiBUaGUgb25seSB0aGluZyBJ4oCZ ZCBub3RlIGlzIHlvdSBzaG91bGQgdGhlbiByZXdyaXRlIHRoZSBmYWlsdXJlIGNhc2UgaW4gcG5m c19sYXlvdXRjb21taXRfaW5vZGUoKSBzbyB0aGF0IGl0IGRvZXNu4oCZdCByZWx5IG9uIHRoZSBz YXZlZCDigJxlbmRfcG9z4oCdLCBidXQgdXNlcyBhcmdzLT5sYXN0Ynl0ZXdyaXR0ZW4gaW5zdGVh ZCAod2l0aCBhIGNvbW1lbnQgdG8gdGhlIGVmZmVjdCB3aHkp4oCmDQo+Pj4+Pj4+IA0KPj4+Pj4+ PiBJbiBmYWN0LCBnaXZlbiB0aGUgcG90ZW50aWFsIGZvciByYWNlcyBoZXJlLCBJIHRoaW5rIHRo ZSByaWdodCB0aGluZyB0byBkbyBpcyB0byBoYXZlIGV4dF90cmVlX3ByZXBhcmVfY29tbWl0KCkg YWx3YXlzIHNldCB0aGUgY29ycmVjdCB2YWx1ZSBmb3IgYXJncy0+bGFzdGJ5dGV3cml0dGVuLg0K Pj4+Pj4+IA0KPj4+Pj4+IE9LLCB0aGF0IGhhcyBjbGVhcmVkIHVwIHRoYXQgY29tbW9uIGZhaWx1 cmUgY2FzZSB0aGF0IHdhcyBnZXR0aW5nIGluIHRoZQ0KPj4+Pj4+IHdheSwgYnV0IG5vdyBpdCBj YW4gc3RpbGwgZmFpbCBsaWtlIHRoaXM6DQo+Pj4+Pj4gDQo+Pj4+PiANCj4+Pj4+IEdvb2QgcHJv Z3Jlc3MhIDotKQ0KPj4+Pj4gDQo+Pj4+Pj4gbmZzX3dyaXRlYmFja191cGRhdGVfaW5vZGUgc2V0 cyBzaXplIDQwOTYgdy8gTkZTX0lOT19JTlZBTElEX0FUVFIgc2V0LCBhbmQgc2V0cyBORlNfSU5P X0xBWU9VVENPTU1JVA0KPj4+Pj4+IDFzdCBuZnNfZ2V0YXR0ciAtPiBwbmZzX2xheW91dGNvbW1p dF9pbm9kZSBzdGFydHMsIGNsZWFycyBsYXlvdXRjb21taXQgZmxhZyBzZXRzIE5GU19JTk9fTEFZ T1VUQ09NTUlUSU5HDQo+Pj4+Pj4gbmZzX3dyaXRlYmFja191cGRhdGVfaW5vZGUgc2V0cyBzaXpl IDgxOTIgdy8gTkZTX0lOT19JTlZBTElEX0FUVFIgc2V0LCBhbmQgc2V0cyBORlNfSU5PX0xBWU9V VENPTU1JVA0KPj4+Pj4+IDFzdCBuZnNfZ2V0YXR0ciAtPiBuZnM0X2xheW91dGNvbW1pdF9yZWxl YXNlIHNldHMgc2l6ZSA0MDk2LCBORlNfSU5PX0lOVkFMSURfQVRUUiBzZXQsIGNsZWFycyBORlNf SU5PX0xBWU9VVENPTU1JVFRJTkcNCj4+Pj4+PiAxc3QgbmZzX2dldGF0dHIgLT4gX19yZXZhbGlk YXRlX2lub2RlIHNldHMgc2l6ZSA0MDk2LCBORlNfSU5PX0lOVkFMSURfQVRUUiBub3Qgc2V0Li4g Y2FjaGUgaXMgdmFsaWQNCj4+Pj4+PiAybmQgbmZzX2dldGF0dHIgaW1tZWRpYXRlbHkgcmV0dXJu cyA0MDk2IGV2ZW4gdGhvdWdoIE5GU19JTk9fTEFZT1VUQ09NTUlUDQo+Pj4+PiANCj4+Pj4+IElz IHRoaXMgYmVpbmcgdGVzdGVkIG9uIHRvcCBvZiB0aGUgY3VycmVudCBsaW51eC1uZXh0L3Rlc3Rp bmc/IE5vcm1hbGx5LCBJ4oCZZCBleHBlY3QgaHR0cDovL2dpdC5saW51eC1uZnMub3JnLz9wPXRy b25kbXkvbGludXgtbmZzLmdpdDthPWNvbW1pdGRpZmY7aD0xMGI3ZTlhZDQ0ODgxZmNkNDZhYzI0 ZWI3Mzc0Mzc3YzZlODk2MmVkIHRvIGNhdXNlIDFzdCBuZnNfZ2V0YXR0cigpIHRvIG5vdCBkZWNs YXJlIHRoZSBjYWNoZSB2YWxpZC4NCj4+Pj4gDQo+Pj4+IFllcywgdGhpcyBpcyBvbiB5b3VyIGxp bnV4LW5leHQgYnJhbmNoLg0KPj4+PiANCj4+Pj4gV2hlbiB0aGUgMXN0IG5mc19nZXRhdHRyKCkg Z29lcyB0aHJvdWdoIG5mc191cGRhdGVfaW5vZGUoKSB0aGUgc2Vjb25kIHRpbWUNCj4+Pj4gKGR1 cmluZyBfX3JldmFsaWRhdGVfaW5vZGUpLCBORlNfSU5PX0lOVkFMSURfQVRUUiBpcyBuZXZlciBz ZXQgYnkgYW55dGhpbmcsDQo+Pj4+IHNpbmNlIGFsbCB0aGUgYXR0cmlidXRlcyByZXR1cm5lZCBt YXRjaCB0aGUgY2FjaGUuICBTbyBldmVuIHRob3VnaA0KPj4+PiBORlNfSU5PX0xBWU9VVENPTU1J VCBpcyBzZXQsIGFuZCB0aGUgY2FjaGVfdmFsaWRpdHkgdmFyaWFibGUgaXMgImZhbHNlIiwNCj4+ Pj4gdGhlIE5GU19JTk9fSU5WQUxJRF9BVFRSIGlzIG5ldmVyIHNldCBpbiB0aGUgImludmFsaWQi IGxvY2FsIHZhcmlhYmxlLg0KPj4+PiANCj4+Pj4gU2hvdWxkIHBuZnNfbGF5b3V0Y29tbWl0X291 dHN0YW5kaW5nKCkgYWx3YXlzIHNldCBORlNfSU5PX0lOVkFMSURfQVRUUj8NCj4+Pj4gDQo+Pj4+ IEJlbg0KPj4+IA0KPj4+IG5mc19wb3N0X29wX3VwZGF0ZV9pbm9kZV9sb2NrZWQoKSBzaG91bGQg YmUgZG9pbmcgdGhhdCBhcyBwYXJ0IG9mIHRoZSBjYWxsY2hhaW4gaW4gbmZzX3dyaXRlYmFja191 cGRhdGVfaW5vZGUoKS4NCj4+PiANCj4+IA0KPj4gQnkgdGhlIHdheS4gSSBqdXN0IG5vdGljZWQg dGhhdCBub3RoaW5nIGFwcGVhcnMgdG8gYmUgdXNpbmcgdGhlIGF0dHJpYnV0ZXMgd2UgcmV0cmll dmUgYXMgcGFydCBvZiB0aGUgbGF5b3V0Y29tbWl0IGNhbGwuIERvZXMgYWRkaW5nIGEgbmZzX3Jl ZnJlc2hfaW5vZGUoKSB0byB0aGUg4oCcc3VjY2Vzc+KAnSBwYXRoIGluIG5mczRfbGF5b3V0Y29t bWl0X2RvbmUoKSBwZXJoYXBzIGhlbHA/DQo+IA0KPiBXZSBkbyBpdCBpbiBsYXlvdXRjb21taXRf cmVsZWFzZToNCj4gDQo+IG5mczRfbGF5b3V0Y29tbWl0X2RvbmUgW25mc3Y0XSgpIHsNCj4gICAu Li4NCj4gfQ0KPiBuZnM0X2xheW91dGNvbW1pdF9yZWxlYXNlIFtuZnN2NF0oKSB7DQo+ICAgLi4u DQo+ICAgbmZzX3Bvc3Rfb3BfdXBkYXRlX2lub2RlX2ZvcmNlX3djYyBbbmZzXSgpIHsNCj4gICAg IG5mc19wb3N0X29wX3VwZGF0ZV9pbm9kZV9mb3JjZV93Y2NfbG9ja2VkIFtuZnNdKCkgew0KPiAg ICAgICBuZnNfcG9zdF9vcF91cGRhdGVfaW5vZGVfbG9ja2VkIFtuZnNdKCkgew0KPiAgICAgICAg IG5mczRfaGF2ZV9kZWxlZ2F0aW9uIFtuZnN2NF0oKSB7DQo+ICAgICAgICAgICBuZnM0X2RvX2No ZWNrX2RlbGVnYXRpb24gW25mc3Y0XSgpOw0KPiAgICAgICAgIH0NCj4gICAgICAgICBuZnNfcmVm cmVzaF9pbm9kZV9sb2NrZWQgW25mc10oKSB7DQo+ICAgICAgICAgICBuZnNfdXBkYXRlX2lub2Rl IFtuZnNdKCkgew0KPiANCj4gDQo+IFNob3VsZCBJIHN0aWxsIHRyeSBhZGRpbmcgaXQgaW4gbmZz NF9sYXlvdXRjb21taXRfZG9uZSgpPw0KDQpObywgdGhhdOKAmXMgT0suIEFzIGxvbmcgYXMgd2Xi gJlyZSB1c2luZyBpdOKApiBQbGVhc2Ugc2VlIG15IHByZXZpb3VzIGVtYWlsLCB0aG91Z2gsIGZv ciBob3cgd2UgbWlnaHQgY2hhbmdlIG5mc191cGRhdGVfaW5vZGUoKQ== ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-28 12:31 ` Trond Myklebust 2016-07-28 14:04 ` Trond Myklebust @ 2016-07-28 15:33 ` Benjamin Coddington 2016-07-28 15:36 ` Trond Myklebust 1 sibling, 1 reply; 69+ messages in thread From: Benjamin Coddington @ 2016-07-28 15:33 UTC (permalink / raw) To: Trond Myklebust; +Cc: hch, List Linux NFS Mailing On 28 Jul 2016, at 8:31, Trond Myklebust wrote: >> On Jul 28, 2016, at 05:47, Benjamin Coddington <bcodding@redhat.com> >> wrote: >> >> >> On 27 Jul 2016, at 14:05, Trond Myklebust wrote: >> >>>> On Jul 27, 2016, at 12:14, Benjamin Coddington >>>> <bcodding@redhat.com> wrote: >>>> >>>> On 27 Jul 2016, at 8:31, Trond Myklebust wrote: >>>> >>>>>> On Jul 27, 2016, at 08:15, Trond Myklebust >>>>>> <trondmy@primarydata.com> wrote: >>>>>> >>>>>> >>>>>>> On Jul 27, 2016, at 07:55, Benjamin Coddington >>>>>>> <bcodding@redhat.com> wrote: >>>>>>> >>>>>>> After adding more debugging, I see that all of that is working >>>>>>> correctly, >>>>>>> but the first LAYOUTCOMMIT is taking the size back down to 4096 >>>>>>> from the >>>>>>> last nfs_writeback_done(), and the second LAYOUTCOMMIT never >>>>>>> brings it back >>>>>>> up again. >>>>>>> >>>>>> >>>>>> Excellent! Thanks for debugging that. >>>>>> >>>>>>> Now I see that we should be marking the block extents as written >>>>>>> atomically with >>>>>>> setting LAYOUTCOMMIT and nfsi->layout->plh_lwb, otherwise a >>>>>>> LAYOUTCOMMIT can >>>>>>> collect extents just added from the next bl_write_cleanup(). >>>>>>> Then, the next >>>>>>> LAYOUTCOMMIT fails, and all we're left with is the size from the >>>>>>> first >>>>>>> LAYOUTCOMMIT. Not sure if that particular problem is the whole >>>>>>> fix, but >>>>>>> that's something to work on. >>>>>>> >>>>>>> I see ways to fix that: >>>>>>> >>>>>>> - make a new pnfs_set_layoutcommit_locked() that can be used to >>>>>>> call >>>>>>> ext_tree_mark_written() inside the i_lock >>>>>>> >>>>>>> - make another pnfs_layoutdriver_type operation to be used >>>>>>> within >>>>>>> pnfs_set_layoutcommit (mark_layoutcommit? set_layoutcommit?), >>>>>>> and call >>>>>>> ext_tree_mark_written() within that.. >>>>>>> >>>>>>> - have .prepare_layoutcommit return a new positive plh_lwb that >>>>>>> would >>>>>>> extend the current LAYOUTCOMMIT >>>>>>> >>>>>>> - make ext_tree_prepare_commit only encode up to plh_lwb >>>>>> >>>>>> I see no reason why ext_tree_prepare_commit() shouldn’t be >>>>>> allowed to extend the args->lastbytewritten. This is a metadata >>>>>> operation that is owned by the pNFS layout driver. >>>>>> The only thing I’d note is you should then rewrite the failure >>>>>> case in pnfs_layoutcommit_inode() so that it doesn’t rely on >>>>>> the saved “end_pos”, but uses args->lastbytewritten instead >>>>>> (with a comment to the effect why)… >>>>> >>>>> In fact, given the potential for races here, I think the right >>>>> thing to do is to have ext_tree_prepare_commit() always set the >>>>> correct value for args->lastbytewritten. >>>> >>>> OK, that has cleared up that common failure case that was getting >>>> in the >>>> way, but now it can still fail like this: >>>> >>> >>> Good progress! :-) >>> >>>> nfs_writeback_update_inode sets size 4096 w/ NFS_INO_INVALID_ATTR >>>> set, and sets NFS_INO_LAYOUTCOMMIT >>>> 1st nfs_getattr -> pnfs_layoutcommit_inode starts, clears >>>> layoutcommit flag sets NFS_INO_LAYOUTCOMMITING >>>> nfs_writeback_update_inode sets size 8192 w/ NFS_INO_INVALID_ATTR >>>> set, and sets NFS_INO_LAYOUTCOMMIT >>>> 1st nfs_getattr -> nfs4_layoutcommit_release sets size 4096, >>>> NFS_INO_INVALID_ATTR set, clears NFS_INO_LAYOUTCOMMITTING >>>> 1st nfs_getattr -> __revalidate_inode sets size 4096, >>>> NFS_INO_INVALID_ATTR not set.. cache is valid >>>> 2nd nfs_getattr immediately returns 4096 even though >>>> NFS_INO_LAYOUTCOMMIT >>> >>> Is this being tested on top of the current linux-next/testing? >>> Normally, I’d expect >>> http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=10b7e9ad44881fcd46ac24eb7374377c6e8962ed >>> to cause 1st nfs_getattr() to not declare the cache valid. >> >> Yes, this is on your linux-next branch. >> >> When the 1st nfs_getattr() goes through nfs_update_inode() the second >> time >> (during __revalidate_inode), NFS_INO_INVALID_ATTR is never set by >> anything, >> since all the attributes returned match the cache. So even though >> NFS_INO_LAYOUTCOMMIT is set, and the cache_validity variable is >> "false", >> the NFS_INO_INVALID_ATTR is never set in the "invalid" local >> variable. >> >> Should pnfs_layoutcommit_outstanding() always set >> NFS_INO_INVALID_ATTR? >> >> Ben > > nfs_post_op_update_inode_locked() should be doing that as part of the > callchain in nfs_writeback_update_inode(). And it is, and the bit persists through the next layoutcommit, it is the next GETATTR response that finds that all the attributes are the same and the bit is cleared. ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-28 15:33 ` Benjamin Coddington @ 2016-07-28 15:36 ` Trond Myklebust 2016-07-28 16:40 ` Benjamin Coddington 0 siblings, 1 reply; 69+ messages in thread From: Trond Myklebust @ 2016-07-28 15:36 UTC (permalink / raw) To: Coddington Benjamin; +Cc: hch, List Linux NFS Mailing DQo+IE9uIEp1bCAyOCwgMjAxNiwgYXQgMTE6MzMsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2Rk aW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPiANCj4gT24gMjggSnVsIDIwMTYsIGF0IDg6MzEsIFRy b25kIE15a2xlYnVzdCB3cm90ZToNCj4gDQo+Pj4gT24gSnVsIDI4LCAyMDE2LCBhdCAwNTo0Nywg QmVuamFtaW4gQ29kZGluZ3RvbiA8YmNvZGRpbmdAcmVkaGF0LmNvbT4gd3JvdGU6DQo+Pj4gDQo+ Pj4gDQo+Pj4gT24gMjcgSnVsIDIwMTYsIGF0IDE0OjA1LCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6 DQo+Pj4gDQo+Pj4+PiBPbiBKdWwgMjcsIDIwMTYsIGF0IDEyOjE0LCBCZW5qYW1pbiBDb2RkaW5n dG9uIDxiY29kZGluZ0ByZWRoYXQuY29tPiB3cm90ZToNCj4+Pj4+IA0KPj4+Pj4gT24gMjcgSnVs IDIwMTYsIGF0IDg6MzEsIFRyb25kIE15a2xlYnVzdCB3cm90ZToNCj4+Pj4+IA0KPj4+Pj4+PiBP biBKdWwgMjcsIDIwMTYsIGF0IDA4OjE1LCBUcm9uZCBNeWtsZWJ1c3QgPHRyb25kbXlAcHJpbWFy eWRhdGEuY29tPiB3cm90ZToNCj4+Pj4+Pj4gDQo+Pj4+Pj4+IA0KPj4+Pj4+Pj4gT24gSnVsIDI3 LCAyMDE2LCBhdCAwNzo1NSwgQmVuamFtaW4gQ29kZGluZ3RvbiA8YmNvZGRpbmdAcmVkaGF0LmNv bT4gd3JvdGU6DQo+Pj4+Pj4+PiANCj4+Pj4+Pj4+IEFmdGVyIGFkZGluZyBtb3JlIGRlYnVnZ2lu ZywgSSBzZWUgdGhhdCBhbGwgb2YgdGhhdCBpcyB3b3JraW5nIGNvcnJlY3RseSwNCj4+Pj4+Pj4+ IGJ1dCB0aGUgZmlyc3QgTEFZT1VUQ09NTUlUIGlzIHRha2luZyB0aGUgc2l6ZSBiYWNrIGRvd24g dG8gNDA5NiBmcm9tIHRoZQ0KPj4+Pj4+Pj4gbGFzdCBuZnNfd3JpdGViYWNrX2RvbmUoKSwgYW5k IHRoZSBzZWNvbmQgTEFZT1VUQ09NTUlUIG5ldmVyIGJyaW5ncyBpdCBiYWNrDQo+Pj4+Pj4+PiB1 cCBhZ2Fpbi4NCj4+Pj4+Pj4+IA0KPj4+Pj4+PiANCj4+Pj4+Pj4gRXhjZWxsZW50ISBUaGFua3Mg Zm9yIGRlYnVnZ2luZyB0aGF0Lg0KPj4+Pj4+PiANCj4+Pj4+Pj4+IE5vdyBJIHNlZSB0aGF0IHdl IHNob3VsZCBiZSBtYXJraW5nIHRoZSBibG9jayBleHRlbnRzIGFzIHdyaXR0ZW4gYXRvbWljYWxs eSB3aXRoDQo+Pj4+Pj4+PiBzZXR0aW5nIExBWU9VVENPTU1JVCBhbmQgbmZzaS0+bGF5b3V0LT5w bGhfbHdiLCBvdGhlcndpc2UgYSBMQVlPVVRDT01NSVQgY2FuDQo+Pj4+Pj4+PiBjb2xsZWN0IGV4 dGVudHMganVzdCBhZGRlZCBmcm9tIHRoZSBuZXh0IGJsX3dyaXRlX2NsZWFudXAoKS4gIFRoZW4s IHRoZSBuZXh0DQo+Pj4+Pj4+PiBMQVlPVVRDT01NSVQgZmFpbHMsIGFuZCBhbGwgd2UncmUgbGVm dCB3aXRoIGlzIHRoZSBzaXplIGZyb20gdGhlIGZpcnN0DQo+Pj4+Pj4+PiBMQVlPVVRDT01NSVQu ICBOb3Qgc3VyZSBpZiB0aGF0IHBhcnRpY3VsYXIgcHJvYmxlbSBpcyB0aGUgd2hvbGUgZml4LCBi dXQNCj4+Pj4+Pj4+IHRoYXQncyBzb21ldGhpbmcgdG8gd29yayBvbi4NCj4+Pj4+Pj4+IA0KPj4+ Pj4+Pj4gSSBzZWUgd2F5cyB0byBmaXggdGhhdDoNCj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4gLSBtYWtl IGEgbmV3IHBuZnNfc2V0X2xheW91dGNvbW1pdF9sb2NrZWQoKSB0aGF0IGNhbiBiZSB1c2VkIHRv IGNhbGwNCj4+Pj4+Pj4+ICBleHRfdHJlZV9tYXJrX3dyaXR0ZW4oKSBpbnNpZGUgdGhlIGlfbG9j aw0KPj4+Pj4+Pj4gDQo+Pj4+Pj4+PiAtIG1ha2UgYW5vdGhlciBwbmZzX2xheW91dGRyaXZlcl90 eXBlIG9wZXJhdGlvbiB0byBiZSB1c2VkIHdpdGhpbg0KPj4+Pj4+Pj4gIHBuZnNfc2V0X2xheW91 dGNvbW1pdCAobWFya19sYXlvdXRjb21taXQ/IHNldF9sYXlvdXRjb21taXQ/KSwgYW5kIGNhbGwN Cj4+Pj4+Pj4+ICBleHRfdHJlZV9tYXJrX3dyaXR0ZW4oKSB3aXRoaW4gdGhhdC4uDQo+Pj4+Pj4+ PiANCj4+Pj4+Pj4+IC0gaGF2ZSAucHJlcGFyZV9sYXlvdXRjb21taXQgcmV0dXJuIGEgbmV3IHBv c2l0aXZlIHBsaF9sd2IgdGhhdCB3b3VsZA0KPj4+Pj4+Pj4gIGV4dGVuZCB0aGUgY3VycmVudCBM QVlPVVRDT01NSVQNCj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4gLSBtYWtlIGV4dF90cmVlX3ByZXBhcmVf Y29tbWl0IG9ubHkgZW5jb2RlIHVwIHRvIHBsaF9sd2INCj4+Pj4+Pj4gDQo+Pj4+Pj4+IEkgc2Vl IG5vIHJlYXNvbiB3aHkgZXh0X3RyZWVfcHJlcGFyZV9jb21taXQoKSBzaG91bGRu4oCZdCBiZSBh bGxvd2VkIHRvIGV4dGVuZCB0aGUgYXJncy0+bGFzdGJ5dGV3cml0dGVuLiBUaGlzIGlzIGEgbWV0 YWRhdGEgb3BlcmF0aW9uIHRoYXQgaXMgb3duZWQgYnkgdGhlIHBORlMgbGF5b3V0IGRyaXZlci4N Cj4+Pj4+Pj4gVGhlIG9ubHkgdGhpbmcgSeKAmWQgbm90ZSBpcyB5b3Ugc2hvdWxkIHRoZW4gcmV3 cml0ZSB0aGUgZmFpbHVyZSBjYXNlIGluIHBuZnNfbGF5b3V0Y29tbWl0X2lub2RlKCkgc28gdGhh dCBpdCBkb2VzbuKAmXQgcmVseSBvbiB0aGUgc2F2ZWQg4oCcZW5kX3Bvc+KAnSwgYnV0IHVzZXMg YXJncy0+bGFzdGJ5dGV3cml0dGVuIGluc3RlYWQgKHdpdGggYSBjb21tZW50IHRvIHRoZSBlZmZl Y3Qgd2h5KeKApg0KPj4+Pj4+IA0KPj4+Pj4+IEluIGZhY3QsIGdpdmVuIHRoZSBwb3RlbnRpYWwg Zm9yIHJhY2VzIGhlcmUsIEkgdGhpbmsgdGhlIHJpZ2h0IHRoaW5nIHRvIGRvIGlzIHRvIGhhdmUg ZXh0X3RyZWVfcHJlcGFyZV9jb21taXQoKSBhbHdheXMgc2V0IHRoZSBjb3JyZWN0IHZhbHVlIGZv ciBhcmdzLT5sYXN0Ynl0ZXdyaXR0ZW4uDQo+Pj4+PiANCj4+Pj4+IE9LLCB0aGF0IGhhcyBjbGVh cmVkIHVwIHRoYXQgY29tbW9uIGZhaWx1cmUgY2FzZSB0aGF0IHdhcyBnZXR0aW5nIGluIHRoZQ0K Pj4+Pj4gd2F5LCBidXQgbm93IGl0IGNhbiBzdGlsbCBmYWlsIGxpa2UgdGhpczoNCj4+Pj4+IA0K Pj4+PiANCj4+Pj4gR29vZCBwcm9ncmVzcyEgOi0pDQo+Pj4+IA0KPj4+Pj4gbmZzX3dyaXRlYmFj a191cGRhdGVfaW5vZGUgc2V0cyBzaXplIDQwOTYgdy8gTkZTX0lOT19JTlZBTElEX0FUVFIgc2V0 LCBhbmQgc2V0cyBORlNfSU5PX0xBWU9VVENPTU1JVA0KPj4+Pj4gMXN0IG5mc19nZXRhdHRyIC0+ IHBuZnNfbGF5b3V0Y29tbWl0X2lub2RlIHN0YXJ0cywgY2xlYXJzIGxheW91dGNvbW1pdCBmbGFn IHNldHMgTkZTX0lOT19MQVlPVVRDT01NSVRJTkcNCj4+Pj4+IG5mc193cml0ZWJhY2tfdXBkYXRl X2lub2RlIHNldHMgc2l6ZSA4MTkyIHcvIE5GU19JTk9fSU5WQUxJRF9BVFRSIHNldCwgYW5kIHNl dHMgTkZTX0lOT19MQVlPVVRDT01NSVQNCj4+Pj4+IDFzdCBuZnNfZ2V0YXR0ciAtPiBuZnM0X2xh eW91dGNvbW1pdF9yZWxlYXNlIHNldHMgc2l6ZSA0MDk2LCBORlNfSU5PX0lOVkFMSURfQVRUUiBz ZXQsIGNsZWFycyBORlNfSU5PX0xBWU9VVENPTU1JVFRJTkcNCj4+Pj4+IDFzdCBuZnNfZ2V0YXR0 ciAtPiBfX3JldmFsaWRhdGVfaW5vZGUgc2V0cyBzaXplIDQwOTYsIE5GU19JTk9fSU5WQUxJRF9B VFRSIG5vdCBzZXQuLiBjYWNoZSBpcyB2YWxpZA0KPj4+Pj4gMm5kIG5mc19nZXRhdHRyIGltbWVk aWF0ZWx5IHJldHVybnMgNDA5NiBldmVuIHRob3VnaCBORlNfSU5PX0xBWU9VVENPTU1JVA0KPj4+ PiANCj4+Pj4gSXMgdGhpcyBiZWluZyB0ZXN0ZWQgb24gdG9wIG9mIHRoZSBjdXJyZW50IGxpbnV4 LW5leHQvdGVzdGluZz8gTm9ybWFsbHksIEnigJlkIGV4cGVjdCBodHRwOi8vZ2l0LmxpbnV4LW5m cy5vcmcvP3A9dHJvbmRteS9saW51eC1uZnMuZ2l0O2E9Y29tbWl0ZGlmZjtoPTEwYjdlOWFkNDQ4 ODFmY2Q0NmFjMjRlYjczNzQzNzdjNmU4OTYyZWQgdG8gY2F1c2UgMXN0IG5mc19nZXRhdHRyKCkg dG8gbm90IGRlY2xhcmUgdGhlIGNhY2hlIHZhbGlkLg0KPj4+IA0KPj4+IFllcywgdGhpcyBpcyBv biB5b3VyIGxpbnV4LW5leHQgYnJhbmNoLg0KPj4+IA0KPj4+IFdoZW4gdGhlIDFzdCBuZnNfZ2V0 YXR0cigpIGdvZXMgdGhyb3VnaCBuZnNfdXBkYXRlX2lub2RlKCkgdGhlIHNlY29uZCB0aW1lDQo+ Pj4gKGR1cmluZyBfX3JldmFsaWRhdGVfaW5vZGUpLCBORlNfSU5PX0lOVkFMSURfQVRUUiBpcyBu ZXZlciBzZXQgYnkgYW55dGhpbmcsDQo+Pj4gc2luY2UgYWxsIHRoZSBhdHRyaWJ1dGVzIHJldHVy bmVkIG1hdGNoIHRoZSBjYWNoZS4gIFNvIGV2ZW4gdGhvdWdoDQo+Pj4gTkZTX0lOT19MQVlPVVRD T01NSVQgaXMgc2V0LCBhbmQgdGhlIGNhY2hlX3ZhbGlkaXR5IHZhcmlhYmxlIGlzICJmYWxzZSIs DQo+Pj4gdGhlIE5GU19JTk9fSU5WQUxJRF9BVFRSIGlzIG5ldmVyIHNldCBpbiB0aGUgImludmFs aWQiIGxvY2FsIHZhcmlhYmxlLg0KPj4+IA0KPj4+IFNob3VsZCBwbmZzX2xheW91dGNvbW1pdF9v dXRzdGFuZGluZygpIGFsd2F5cyBzZXQgTkZTX0lOT19JTlZBTElEX0FUVFI/DQo+Pj4gDQo+Pj4g QmVuDQo+PiANCj4+IG5mc19wb3N0X29wX3VwZGF0ZV9pbm9kZV9sb2NrZWQoKSBzaG91bGQgYmUg ZG9pbmcgdGhhdCBhcyBwYXJ0IG9mIHRoZSBjYWxsY2hhaW4gaW4gbmZzX3dyaXRlYmFja191cGRh dGVfaW5vZGUoKS4NCj4gDQo+IEFuZCBpdCBpcywgYW5kIHRoZSBiaXQgcGVyc2lzdHMgdGhyb3Vn aCB0aGUgbmV4dCBsYXlvdXRjb21taXQsIGl0IGlzIHRoZSBuZXh0IEdFVEFUVFIgcmVzcG9uc2Ug dGhhdCBmaW5kcyB0aGF0IGFsbCB0aGUgYXR0cmlidXRlcyBhcmUgdGhlIHNhbWUgYW5kIHRoZSBi aXQgaXMgY2xlYXJlZC4NCj4gDQoNClNvIHdoYXQgaWYgd2UgcmVxdWlyZSB0aGF0IG5mc2ktPmNh Y2hlX3ZhbGlkaXR5IGJlIHNldCB0byBzYXZlX2NhY2hlX3ZhbGlkaXR5ICYgTkZTX0lOT19JTlZB TElEX0FUVFIgYXQgYSBtaW5pbXVtIGlmIHBuZnNfbGF5b3V0Y29tbWl0X291dHN0YW5kaW5nKCk/ DQoNCg0K ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-28 15:36 ` Trond Myklebust @ 2016-07-28 16:40 ` Benjamin Coddington 2016-07-28 16:41 ` Trond Myklebust 0 siblings, 1 reply; 69+ messages in thread From: Benjamin Coddington @ 2016-07-28 16:40 UTC (permalink / raw) To: Trond Myklebust; +Cc: hch, List Linux NFS Mailing On 28 Jul 2016, at 11:36, Trond Myklebust wrote: >> On Jul 28, 2016, at 11:33, Benjamin Coddington <bcodding@redhat.com> >> wrote: >> >> On 28 Jul 2016, at 8:31, Trond Myklebust wrote: >> >>>> On Jul 28, 2016, at 05:47, Benjamin Coddington >>>> <bcodding@redhat.com> wrote: >>>> >>>> >>>> On 27 Jul 2016, at 14:05, Trond Myklebust wrote: >>>> >>>>>> On Jul 27, 2016, at 12:14, Benjamin Coddington >>>>>> <bcodding@redhat.com> wrote: >>>>>> >>>>>> On 27 Jul 2016, at 8:31, Trond Myklebust wrote: >>>>>> >>>>>>>> On Jul 27, 2016, at 08:15, Trond Myklebust >>>>>>>> <trondmy@primarydata.com> wrote: >>>>>>>> >>>>>>>> >>>>>>>>> On Jul 27, 2016, at 07:55, Benjamin Coddington >>>>>>>>> <bcodding@redhat.com> wrote: >>>>>>>>> >>>>>>>>> After adding more debugging, I see that all of that is working >>>>>>>>> correctly, >>>>>>>>> but the first LAYOUTCOMMIT is taking the size back down to >>>>>>>>> 4096 from the >>>>>>>>> last nfs_writeback_done(), and the second LAYOUTCOMMIT never >>>>>>>>> brings it back >>>>>>>>> up again. >>>>>>>>> >>>>>>>> >>>>>>>> Excellent! Thanks for debugging that. >>>>>>>> >>>>>>>>> Now I see that we should be marking the block extents as >>>>>>>>> written atomically with >>>>>>>>> setting LAYOUTCOMMIT and nfsi->layout->plh_lwb, otherwise a >>>>>>>>> LAYOUTCOMMIT can >>>>>>>>> collect extents just added from the next bl_write_cleanup(). >>>>>>>>> Then, the next >>>>>>>>> LAYOUTCOMMIT fails, and all we're left with is the size from >>>>>>>>> the first >>>>>>>>> LAYOUTCOMMIT. Not sure if that particular problem is the >>>>>>>>> whole fix, but >>>>>>>>> that's something to work on. >>>>>>>>> >>>>>>>>> I see ways to fix that: >>>>>>>>> >>>>>>>>> - make a new pnfs_set_layoutcommit_locked() that can be used >>>>>>>>> to call >>>>>>>>> ext_tree_mark_written() inside the i_lock >>>>>>>>> >>>>>>>>> - make another pnfs_layoutdriver_type operation to be used >>>>>>>>> within >>>>>>>>> pnfs_set_layoutcommit (mark_layoutcommit? set_layoutcommit?), >>>>>>>>> and call >>>>>>>>> ext_tree_mark_written() within that.. >>>>>>>>> >>>>>>>>> - have .prepare_layoutcommit return a new positive plh_lwb >>>>>>>>> that would >>>>>>>>> extend the current LAYOUTCOMMIT >>>>>>>>> >>>>>>>>> - make ext_tree_prepare_commit only encode up to plh_lwb >>>>>>>> >>>>>>>> I see no reason why ext_tree_prepare_commit() shouldn’t be >>>>>>>> allowed to extend the args->lastbytewritten. This is a metadata >>>>>>>> operation that is owned by the pNFS layout driver. >>>>>>>> The only thing I’d note is you should then rewrite the >>>>>>>> failure case in pnfs_layoutcommit_inode() so that it doesn’t >>>>>>>> rely on the saved “end_pos”, but uses args->lastbytewritten >>>>>>>> instead (with a comment to the effect why)… >>>>>>> >>>>>>> In fact, given the potential for races here, I think the right >>>>>>> thing to do is to have ext_tree_prepare_commit() always set the >>>>>>> correct value for args->lastbytewritten. >>>>>> >>>>>> OK, that has cleared up that common failure case that was getting >>>>>> in the >>>>>> way, but now it can still fail like this: >>>>>> >>>>> >>>>> Good progress! :-) >>>>> >>>>>> nfs_writeback_update_inode sets size 4096 w/ NFS_INO_INVALID_ATTR >>>>>> set, and sets NFS_INO_LAYOUTCOMMIT >>>>>> 1st nfs_getattr -> pnfs_layoutcommit_inode starts, clears >>>>>> layoutcommit flag sets NFS_INO_LAYOUTCOMMITING >>>>>> nfs_writeback_update_inode sets size 8192 w/ NFS_INO_INVALID_ATTR >>>>>> set, and sets NFS_INO_LAYOUTCOMMIT >>>>>> 1st nfs_getattr -> nfs4_layoutcommit_release sets size 4096, >>>>>> NFS_INO_INVALID_ATTR set, clears NFS_INO_LAYOUTCOMMITTING >>>>>> 1st nfs_getattr -> __revalidate_inode sets size 4096, >>>>>> NFS_INO_INVALID_ATTR not set.. cache is valid >>>>>> 2nd nfs_getattr immediately returns 4096 even though >>>>>> NFS_INO_LAYOUTCOMMIT >>>>> >>>>> Is this being tested on top of the current linux-next/testing? >>>>> Normally, I’d expect >>>>> http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=10b7e9ad44881fcd46ac24eb7374377c6e8962ed >>>>> to cause 1st nfs_getattr() to not declare the cache valid. >>>> >>>> Yes, this is on your linux-next branch. >>>> >>>> When the 1st nfs_getattr() goes through nfs_update_inode() the >>>> second time >>>> (during __revalidate_inode), NFS_INO_INVALID_ATTR is never set by >>>> anything, >>>> since all the attributes returned match the cache. So even though >>>> NFS_INO_LAYOUTCOMMIT is set, and the cache_validity variable is >>>> "false", >>>> the NFS_INO_INVALID_ATTR is never set in the "invalid" local >>>> variable. >>>> >>>> Should pnfs_layoutcommit_outstanding() always set >>>> NFS_INO_INVALID_ATTR? >>>> >>>> Ben >>> >>> nfs_post_op_update_inode_locked() should be doing that as part of >>> the callchain in nfs_writeback_update_inode(). >> >> And it is, and the bit persists through the next layoutcommit, it is >> the next GETATTR response that finds that all the attributes are the >> same and the bit is cleared. >> > > So what if we require that nfsi->cache_validity be set to > save_cache_validity & NFS_INO_INVALID_ATTR at a minimum if > pnfs_layoutcommit_outstanding()? With this, I am unable to reproduce the problem: @@ -1665,7 +1684,7 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr) unsigned long now = jiffies; unsigned long save_cache_validity; bool have_writers = nfs_file_has_buffered_writers(nfsi); - bool cache_revalidated; + bool cache_revalidated = true; dfprintk(VFS, "NFS: %s(%s/%lu fh_crc=0x%08x ct=%d info=0x%x)\n", __func__, inode->i_sb->s_id, inode->i_ino, @@ -1714,8 +1733,10 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr) /* Do atomic weak cache consistency updates */ invalid |= nfs_wcc_update_inode(inode, fattr); - - cache_revalidated = !pnfs_layoutcommit_outstanding(inode); + if (pnfs_layoutcommit_outstanding(inode)) { + nfsi->cache_validity |= save_cache_validity & NFS_INO_INVALID_ATTR; + cache_revalidated = false; + } I'll send these two patches along shortly unless otherwise called off.. ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-28 16:40 ` Benjamin Coddington @ 2016-07-28 16:41 ` Trond Myklebust 0 siblings, 0 replies; 69+ messages in thread From: Trond Myklebust @ 2016-07-28 16:41 UTC (permalink / raw) To: Coddington Benjamin; +Cc: hch, List Linux NFS Mailing DQo+IE9uIEp1bCAyOCwgMjAxNiwgYXQgMTI6NDAsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2Rk aW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPiANCj4gT24gMjggSnVsIDIwMTYsIGF0IDExOjM2LCBU cm9uZCBNeWtsZWJ1c3Qgd3JvdGU6DQo+IA0KPj4+IE9uIEp1bCAyOCwgMjAxNiwgYXQgMTE6MzMs IEJlbmphbWluIENvZGRpbmd0b24gPGJjb2RkaW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPj4+IA0K Pj4+IE9uIDI4IEp1bCAyMDE2LCBhdCA4OjMxLCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6DQo+Pj4g DQo+Pj4+PiBPbiBKdWwgMjgsIDIwMTYsIGF0IDA1OjQ3LCBCZW5qYW1pbiBDb2RkaW5ndG9uIDxi Y29kZGluZ0ByZWRoYXQuY29tPiB3cm90ZToNCj4+Pj4+IA0KPj4+Pj4gDQo+Pj4+PiBPbiAyNyBK dWwgMjAxNiwgYXQgMTQ6MDUsIFRyb25kIE15a2xlYnVzdCB3cm90ZToNCj4+Pj4+IA0KPj4+Pj4+ PiBPbiBKdWwgMjcsIDIwMTYsIGF0IDEyOjE0LCBCZW5qYW1pbiBDb2RkaW5ndG9uIDxiY29kZGlu Z0ByZWRoYXQuY29tPiB3cm90ZToNCj4+Pj4+Pj4gDQo+Pj4+Pj4+IE9uIDI3IEp1bCAyMDE2LCBh dCA4OjMxLCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6DQo+Pj4+Pj4+IA0KPj4+Pj4+Pj4+IE9uIEp1 bCAyNywgMjAxNiwgYXQgMDg6MTUsIFRyb25kIE15a2xlYnVzdCA8dHJvbmRteUBwcmltYXJ5ZGF0 YS5jb20+IHdyb3RlOg0KPj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4+PiBPbiBKdWwg MjcsIDIwMTYsIGF0IDA3OjU1LCBCZW5qYW1pbiBDb2RkaW5ndG9uIDxiY29kZGluZ0ByZWRoYXQu Y29tPiB3cm90ZToNCj4+Pj4+Pj4+Pj4gDQo+Pj4+Pj4+Pj4+IEFmdGVyIGFkZGluZyBtb3JlIGRl YnVnZ2luZywgSSBzZWUgdGhhdCBhbGwgb2YgdGhhdCBpcyB3b3JraW5nIGNvcnJlY3RseSwNCj4+ Pj4+Pj4+Pj4gYnV0IHRoZSBmaXJzdCBMQVlPVVRDT01NSVQgaXMgdGFraW5nIHRoZSBzaXplIGJh Y2sgZG93biB0byA0MDk2IGZyb20gdGhlDQo+Pj4+Pj4+Pj4+IGxhc3QgbmZzX3dyaXRlYmFja19k b25lKCksIGFuZCB0aGUgc2Vjb25kIExBWU9VVENPTU1JVCBuZXZlciBicmluZ3MgaXQgYmFjaw0K Pj4+Pj4+Pj4+PiB1cCBhZ2Fpbi4NCj4+Pj4+Pj4+Pj4gDQo+Pj4+Pj4+Pj4gDQo+Pj4+Pj4+Pj4g RXhjZWxsZW50ISBUaGFua3MgZm9yIGRlYnVnZ2luZyB0aGF0Lg0KPj4+Pj4+Pj4+IA0KPj4+Pj4+ Pj4+PiBOb3cgSSBzZWUgdGhhdCB3ZSBzaG91bGQgYmUgbWFya2luZyB0aGUgYmxvY2sgZXh0ZW50 cyBhcyB3cml0dGVuIGF0b21pY2FsbHkgd2l0aA0KPj4+Pj4+Pj4+PiBzZXR0aW5nIExBWU9VVENP TU1JVCBhbmQgbmZzaS0+bGF5b3V0LT5wbGhfbHdiLCBvdGhlcndpc2UgYSBMQVlPVVRDT01NSVQg Y2FuDQo+Pj4+Pj4+Pj4+IGNvbGxlY3QgZXh0ZW50cyBqdXN0IGFkZGVkIGZyb20gdGhlIG5leHQg Ymxfd3JpdGVfY2xlYW51cCgpLiAgVGhlbiwgdGhlIG5leHQNCj4+Pj4+Pj4+Pj4gTEFZT1VUQ09N TUlUIGZhaWxzLCBhbmQgYWxsIHdlJ3JlIGxlZnQgd2l0aCBpcyB0aGUgc2l6ZSBmcm9tIHRoZSBm aXJzdA0KPj4+Pj4+Pj4+PiBMQVlPVVRDT01NSVQuICBOb3Qgc3VyZSBpZiB0aGF0IHBhcnRpY3Vs YXIgcHJvYmxlbSBpcyB0aGUgd2hvbGUgZml4LCBidXQNCj4+Pj4+Pj4+Pj4gdGhhdCdzIHNvbWV0 aGluZyB0byB3b3JrIG9uLg0KPj4+Pj4+Pj4+PiANCj4+Pj4+Pj4+Pj4gSSBzZWUgd2F5cyB0byBm aXggdGhhdDoNCj4+Pj4+Pj4+Pj4gDQo+Pj4+Pj4+Pj4+IC0gbWFrZSBhIG5ldyBwbmZzX3NldF9s YXlvdXRjb21taXRfbG9ja2VkKCkgdGhhdCBjYW4gYmUgdXNlZCB0byBjYWxsDQo+Pj4+Pj4+Pj4+ IGV4dF90cmVlX21hcmtfd3JpdHRlbigpIGluc2lkZSB0aGUgaV9sb2NrDQo+Pj4+Pj4+Pj4+IA0K Pj4+Pj4+Pj4+PiAtIG1ha2UgYW5vdGhlciBwbmZzX2xheW91dGRyaXZlcl90eXBlIG9wZXJhdGlv biB0byBiZSB1c2VkIHdpdGhpbg0KPj4+Pj4+Pj4+PiBwbmZzX3NldF9sYXlvdXRjb21taXQgKG1h cmtfbGF5b3V0Y29tbWl0PyBzZXRfbGF5b3V0Y29tbWl0PyksIGFuZCBjYWxsDQo+Pj4+Pj4+Pj4+ IGV4dF90cmVlX21hcmtfd3JpdHRlbigpIHdpdGhpbiB0aGF0Li4NCj4+Pj4+Pj4+Pj4gDQo+Pj4+ Pj4+Pj4+IC0gaGF2ZSAucHJlcGFyZV9sYXlvdXRjb21taXQgcmV0dXJuIGEgbmV3IHBvc2l0aXZl IHBsaF9sd2IgdGhhdCB3b3VsZA0KPj4+Pj4+Pj4+PiBleHRlbmQgdGhlIGN1cnJlbnQgTEFZT1VU Q09NTUlUDQo+Pj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4+PiAtIG1ha2UgZXh0X3RyZWVfcHJlcGFyZV9j b21taXQgb25seSBlbmNvZGUgdXAgdG8gcGxoX2x3Yg0KPj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4+IEkg c2VlIG5vIHJlYXNvbiB3aHkgZXh0X3RyZWVfcHJlcGFyZV9jb21taXQoKSBzaG91bGRu4oCZdCBi ZSBhbGxvd2VkIHRvIGV4dGVuZCB0aGUgYXJncy0+bGFzdGJ5dGV3cml0dGVuLiBUaGlzIGlzIGEg bWV0YWRhdGEgb3BlcmF0aW9uIHRoYXQgaXMgb3duZWQgYnkgdGhlIHBORlMgbGF5b3V0IGRyaXZl ci4NCj4+Pj4+Pj4+PiBUaGUgb25seSB0aGluZyBJ4oCZZCBub3RlIGlzIHlvdSBzaG91bGQgdGhl biByZXdyaXRlIHRoZSBmYWlsdXJlIGNhc2UgaW4gcG5mc19sYXlvdXRjb21taXRfaW5vZGUoKSBz byB0aGF0IGl0IGRvZXNu4oCZdCByZWx5IG9uIHRoZSBzYXZlZCDigJxlbmRfcG9z4oCdLCBidXQg dXNlcyBhcmdzLT5sYXN0Ynl0ZXdyaXR0ZW4gaW5zdGVhZCAod2l0aCBhIGNvbW1lbnQgdG8gdGhl IGVmZmVjdCB3aHkp4oCmDQo+Pj4+Pj4+PiANCj4+Pj4+Pj4+IEluIGZhY3QsIGdpdmVuIHRoZSBw b3RlbnRpYWwgZm9yIHJhY2VzIGhlcmUsIEkgdGhpbmsgdGhlIHJpZ2h0IHRoaW5nIHRvIGRvIGlz IHRvIGhhdmUgZXh0X3RyZWVfcHJlcGFyZV9jb21taXQoKSBhbHdheXMgc2V0IHRoZSBjb3JyZWN0 IHZhbHVlIGZvciBhcmdzLT5sYXN0Ynl0ZXdyaXR0ZW4uDQo+Pj4+Pj4+IA0KPj4+Pj4+PiBPSywg dGhhdCBoYXMgY2xlYXJlZCB1cCB0aGF0IGNvbW1vbiBmYWlsdXJlIGNhc2UgdGhhdCB3YXMgZ2V0 dGluZyBpbiB0aGUNCj4+Pj4+Pj4gd2F5LCBidXQgbm93IGl0IGNhbiBzdGlsbCBmYWlsIGxpa2Ug dGhpczoNCj4+Pj4+Pj4gDQo+Pj4+Pj4gDQo+Pj4+Pj4gR29vZCBwcm9ncmVzcyEgOi0pDQo+Pj4+ Pj4gDQo+Pj4+Pj4+IG5mc193cml0ZWJhY2tfdXBkYXRlX2lub2RlIHNldHMgc2l6ZSA0MDk2IHcv IE5GU19JTk9fSU5WQUxJRF9BVFRSIHNldCwgYW5kIHNldHMgTkZTX0lOT19MQVlPVVRDT01NSVQN Cj4+Pj4+Pj4gMXN0IG5mc19nZXRhdHRyIC0+IHBuZnNfbGF5b3V0Y29tbWl0X2lub2RlIHN0YXJ0 cywgY2xlYXJzIGxheW91dGNvbW1pdCBmbGFnIHNldHMgTkZTX0lOT19MQVlPVVRDT01NSVRJTkcN Cj4+Pj4+Pj4gbmZzX3dyaXRlYmFja191cGRhdGVfaW5vZGUgc2V0cyBzaXplIDgxOTIgdy8gTkZT X0lOT19JTlZBTElEX0FUVFIgc2V0LCBhbmQgc2V0cyBORlNfSU5PX0xBWU9VVENPTU1JVA0KPj4+ Pj4+PiAxc3QgbmZzX2dldGF0dHIgLT4gbmZzNF9sYXlvdXRjb21taXRfcmVsZWFzZSBzZXRzIHNp emUgNDA5NiwgTkZTX0lOT19JTlZBTElEX0FUVFIgc2V0LCBjbGVhcnMgTkZTX0lOT19MQVlPVVRD T01NSVRUSU5HDQo+Pj4+Pj4+IDFzdCBuZnNfZ2V0YXR0ciAtPiBfX3JldmFsaWRhdGVfaW5vZGUg c2V0cyBzaXplIDQwOTYsIE5GU19JTk9fSU5WQUxJRF9BVFRSIG5vdCBzZXQuLiBjYWNoZSBpcyB2 YWxpZA0KPj4+Pj4+PiAybmQgbmZzX2dldGF0dHIgaW1tZWRpYXRlbHkgcmV0dXJucyA0MDk2IGV2 ZW4gdGhvdWdoIE5GU19JTk9fTEFZT1VUQ09NTUlUDQo+Pj4+Pj4gDQo+Pj4+Pj4gSXMgdGhpcyBi ZWluZyB0ZXN0ZWQgb24gdG9wIG9mIHRoZSBjdXJyZW50IGxpbnV4LW5leHQvdGVzdGluZz8gTm9y bWFsbHksIEnigJlkIGV4cGVjdCBodHRwOi8vZ2l0LmxpbnV4LW5mcy5vcmcvP3A9dHJvbmRteS9s aW51eC1uZnMuZ2l0O2E9Y29tbWl0ZGlmZjtoPTEwYjdlOWFkNDQ4ODFmY2Q0NmFjMjRlYjczNzQz NzdjNmU4OTYyZWQgdG8gY2F1c2UgMXN0IG5mc19nZXRhdHRyKCkgdG8gbm90IGRlY2xhcmUgdGhl IGNhY2hlIHZhbGlkLg0KPj4+Pj4gDQo+Pj4+PiBZZXMsIHRoaXMgaXMgb24geW91ciBsaW51eC1u ZXh0IGJyYW5jaC4NCj4+Pj4+IA0KPj4+Pj4gV2hlbiB0aGUgMXN0IG5mc19nZXRhdHRyKCkgZ29l cyB0aHJvdWdoIG5mc191cGRhdGVfaW5vZGUoKSB0aGUgc2Vjb25kIHRpbWUNCj4+Pj4+IChkdXJp bmcgX19yZXZhbGlkYXRlX2lub2RlKSwgTkZTX0lOT19JTlZBTElEX0FUVFIgaXMgbmV2ZXIgc2V0 IGJ5IGFueXRoaW5nLA0KPj4+Pj4gc2luY2UgYWxsIHRoZSBhdHRyaWJ1dGVzIHJldHVybmVkIG1h dGNoIHRoZSBjYWNoZS4gIFNvIGV2ZW4gdGhvdWdoDQo+Pj4+PiBORlNfSU5PX0xBWU9VVENPTU1J VCBpcyBzZXQsIGFuZCB0aGUgY2FjaGVfdmFsaWRpdHkgdmFyaWFibGUgaXMgImZhbHNlIiwNCj4+ Pj4+IHRoZSBORlNfSU5PX0lOVkFMSURfQVRUUiBpcyBuZXZlciBzZXQgaW4gdGhlICJpbnZhbGlk IiBsb2NhbCB2YXJpYWJsZS4NCj4+Pj4+IA0KPj4+Pj4gU2hvdWxkIHBuZnNfbGF5b3V0Y29tbWl0 X291dHN0YW5kaW5nKCkgYWx3YXlzIHNldCBORlNfSU5PX0lOVkFMSURfQVRUUj8NCj4+Pj4+IA0K Pj4+Pj4gQmVuDQo+Pj4+IA0KPj4+PiBuZnNfcG9zdF9vcF91cGRhdGVfaW5vZGVfbG9ja2VkKCkg c2hvdWxkIGJlIGRvaW5nIHRoYXQgYXMgcGFydCBvZiB0aGUgY2FsbGNoYWluIGluIG5mc193cml0 ZWJhY2tfdXBkYXRlX2lub2RlKCkuDQo+Pj4gDQo+Pj4gQW5kIGl0IGlzLCBhbmQgdGhlIGJpdCBw ZXJzaXN0cyB0aHJvdWdoIHRoZSBuZXh0IGxheW91dGNvbW1pdCwgaXQgaXMgdGhlIG5leHQgR0VU QVRUUiByZXNwb25zZSB0aGF0IGZpbmRzIHRoYXQgYWxsIHRoZSBhdHRyaWJ1dGVzIGFyZSB0aGUg c2FtZSBhbmQgdGhlIGJpdCBpcyBjbGVhcmVkLg0KPj4+IA0KPj4gDQo+PiBTbyB3aGF0IGlmIHdl IHJlcXVpcmUgdGhhdCBuZnNpLT5jYWNoZV92YWxpZGl0eSBiZSBzZXQgdG8gc2F2ZV9jYWNoZV92 YWxpZGl0eSAmIE5GU19JTk9fSU5WQUxJRF9BVFRSIGF0IGEgbWluaW11bSBpZiBwbmZzX2xheW91 dGNvbW1pdF9vdXRzdGFuZGluZygpPw0KPiANCj4gV2l0aCB0aGlzLCBJIGFtIHVuYWJsZSB0byBy ZXByb2R1Y2UgdGhlIHByb2JsZW06DQo+IA0KPiBAQCAtMTY2NSw3ICsxNjg0LDcgQEAgc3RhdGlj IGludCBuZnNfdXBkYXRlX2lub2RlKHN0cnVjdCBpbm9kZSAqaW5vZGUsIHN0cnVjdCBuZnNfZmF0 dHIgKmZhdHRyKQ0KPiAgICAgICAgdW5zaWduZWQgbG9uZyBub3cgPSBqaWZmaWVzOw0KPiAgICAg ICAgdW5zaWduZWQgbG9uZyBzYXZlX2NhY2hlX3ZhbGlkaXR5Ow0KPiAgICAgICAgYm9vbCBoYXZl X3dyaXRlcnMgPSBuZnNfZmlsZV9oYXNfYnVmZmVyZWRfd3JpdGVycyhuZnNpKTsNCj4gLSAgICAg ICBib29sIGNhY2hlX3JldmFsaWRhdGVkOw0KPiArICAgICAgIGJvb2wgY2FjaGVfcmV2YWxpZGF0 ZWQgPSB0cnVlOw0KPiANCj4gICAgICAgIGRmcHJpbnRrKFZGUywgIk5GUzogJXMoJXMvJWx1IGZo X2NyYz0weCUwOHggY3Q9JWQgaW5mbz0weCV4KVxuIiwNCj4gICAgICAgICAgICAgICAgICAgICAg ICBfX2Z1bmNfXywgaW5vZGUtPmlfc2ItPnNfaWQsIGlub2RlLT5pX2lubywNCj4gQEAgLTE3MTQs OCArMTczMywxMCBAQCBzdGF0aWMgaW50IG5mc191cGRhdGVfaW5vZGUoc3RydWN0IGlub2RlICpp bm9kZSwgc3RydWN0IG5mc19mYXR0ciAqZmF0dHIpDQo+ICAgICAgICAvKiBEbyBhdG9taWMgd2Vh ayBjYWNoZSBjb25zaXN0ZW5jeSB1cGRhdGVzICovDQo+ICAgICAgICBpbnZhbGlkIHw9IG5mc193 Y2NfdXBkYXRlX2lub2RlKGlub2RlLCBmYXR0cik7DQo+IA0KPiAtDQo+IC0gICAgICAgY2FjaGVf cmV2YWxpZGF0ZWQgPSAhcG5mc19sYXlvdXRjb21taXRfb3V0c3RhbmRpbmcoaW5vZGUpOw0KPiAr ICAgICAgIGlmIChwbmZzX2xheW91dGNvbW1pdF9vdXRzdGFuZGluZyhpbm9kZSkpIHsNCj4gKyAg ICAgICAgICAgICAgIG5mc2ktPmNhY2hlX3ZhbGlkaXR5IHw9IHNhdmVfY2FjaGVfdmFsaWRpdHkg JiBORlNfSU5PX0lOVkFMSURfQVRUUjsNCj4gKyAgICAgICAgICAgICAgIGNhY2hlX3JldmFsaWRh dGVkID0gZmFsc2U7DQo+ICsgICAgICAgfQ0KPiANCj4gSSdsbCBzZW5kIHRoZXNlIHR3byBwYXRj aGVzIGFsb25nIHNob3J0bHkgdW5sZXNzIG90aGVyd2lzZSBjYWxsZWQgb2ZmLi4NCg0KVGhhdCBs b29rcyBqdXN0IGZpbmUuIFRoYW5rcyBmb3IgeW91ciB3b3JrIG9uIHRoaXMhDQoNCg0K ^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH v4 24/28] Getattr doesn't require data sync semantics 2016-07-19 20:00 ` [PATCH v4 24/28] " Benjamin Coddington 2016-07-19 20:06 ` Trond Myklebust @ 2016-07-19 20:09 ` Benjamin Coddington 1 sibling, 0 replies; 69+ messages in thread From: Benjamin Coddington @ 2016-07-19 20:09 UTC (permalink / raw) To: hch; +Cc: Trond Myklebust, linux-nfs On 19 Jul 2016, at 16:00, Benjamin Coddington wrote: > On 18 Jul 2016, at 23:58, hch@infradead.org wrote: > >> On Mon, Jul 18, 2016 at 04:59:09AM +0000, Trond Myklebust wrote: >>> Actually... The problem might be that a previous attribute update is >>> marking the attribute cache as being revalidated. Does the following >>> patch help? >> >> It doesn't. Also with your most recent linux-next branch the test >> now cause the systems to OOM with or without your patch (with mine >> it's >> still fine). I tested with your writeback branch from about two or >> three days ago before, and with that + your patch it also 'just >> fails' >> and doesn't OOM. Looks like whatever causes the bug also creates >> a temporarily memory leak when combined with recent changes from your >> tree, most likely something from the pnfs branch. > > I couldn't find the memory leak using kmemleak, but it OOMs pretty > quick. If I > insert an mdelay(200) just after the lookup_again: marker in > pnfs_update_layout() it doesn't OOM, but it seems stuck forever in a > loop on > that marker: > > [ 1230.635586] pnfs_find_alloc_layout Begin ino=ffff88003ef986f8 > layout=ffff8800392bca58 > [ 1230.636729] pnfs_find_lseg:Begin > [ 1230.637538] pnfs_find_lseg:Return lseg (null) ref 0 > [ 1230.638582] --> send_layoutget > [ 1230.639499] --> nfs4_proc_layoutget > [ 1230.640525] --> nfs4_layoutget_prepare > [ 1230.641479] --> nfs41_setup_sequence > [ 1230.641581] <-- nfs4_proc_layoutget status=-512 > [ 1230.643288] --> nfs4_alloc_slot used_slots=0000 > highest_used=4294967295 max_slots=31 > [ 1230.644348] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 > slotid=0 > [ 1230.645373] <-- nfs41_setup_sequence slotid=0 seqid=4376 > [ 1230.646356] <-- nfs4_layoutget_prepare > [ 1230.647357] encode_sequence: sessionid=1468956665:2:3:0 seqid=4376 > slotid=0 max_slotid=0 cache_this=0 > [ 1230.648522] encode_layoutget: 1st type:0x5 iomode:2 off:122880 > len:4096 mc:4096 > [ 1230.650182] decode_layoutget roff:122880 rlen:4096 riomode:2, > lo_type:0x5, lo.len:48 > [ 1230.651331] --> nfs4_layoutget_done > [ 1230.652233] --> nfs4_alloc_slot used_slots=0001 highest_used=0 > max_slots=31 > [ 1230.653409] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 > slotid=1 > [ 1230.654547] nfs4_free_slot: slotid 1 highest_used_slotid 0 > [ 1230.655606] nfs41_sequence_done: Error 0 free the slot > [ 1230.656635] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 > [ 1230.657739] <-- nfs4_layoutget_done > [ 1230.658650] --> nfs4_layoutget_release > [ 1230.659626] <-- nfs4_layoutget_release > > This debug output is identical for every cycle of the loop. Except for the monotonically incrementing sequence id! sorry.. :/ Ben ^ permalink raw reply [flat|nested] 69+ messages in thread
end of thread, other threads:[~2016-07-28 16:41 UTC | newest] Thread overview: 69+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-07-06 22:29 [PATCH v4 00/28] NFS writeback performance patches for v4.8 Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 01/28] NFS: Don't flush caches for a getattr that races with writeback Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 02/28] NFS: Cache access checks more aggressively Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 03/28] NFS: Cache aggressively when file is open for writing Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 04/28] NFS: Kill NFS_INO_NFS_INO_FLUSHING: it is a performance killer Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 05/28] NFS: writepage of a single page should not be synchronous Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 06/28] NFS: Don't hold the inode lock across fsync() Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 07/28] NFS: Don't call COMMIT in ->releasepage() Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 08/28] pNFS/files: Fix layoutcommit after a commit to DS Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 09/28] pNFS/flexfiles: " Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 10/28] pNFS/flexfiles: Clean up calls to pnfs_set_layoutcommit() Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 11/28] pNFS: Files and flexfiles always need to commit before layoutcommit Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 12/28] pNFS: Ensure we layoutcommit before revalidating attributes Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 13/28] pNFS: pnfs_layoutcommit_outstanding() is no longer used when !CONFIG_NFS_V4_1 Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 14/28] NFS: Fix O_DIRECT verifier problems Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 15/28] NFS: Ensure we reset the write verifier 'committed' value on resend Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 16/28] NFS: Remove racy size manipulations in O_DIRECT Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 17/28] NFS Cleanup: move call to generic_write_checks() into fs/nfs/direct.c Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 18/28] NFS: Move buffered I/O locking into nfs_file_write() Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 19/28] NFS: Do not serialise O_DIRECT reads and writes Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 20/28] NFS: Cleanup nfs_direct_complete() Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 21/28] NFS: Remove redundant waits for O_DIRECT in fsync() and write_begin() Trond Myklebust 2016-07-06 22:29 ` [PATCH v4 22/28] NFS: Remove unused function nfs_revalidate_mapping_protected() Trond Myklebust 2016-07-06 22:30 ` [PATCH v4 23/28] NFS: Do not aggressively cache file attributes in the case of O_DIRECT Trond Myklebust 2016-07-06 22:30 ` [PATCH v4 24/28] NFS: Getattr doesn't require data sync semantics Trond Myklebust 2016-07-06 22:30 ` [PATCH v4 25/28] NFSv4.2: Fix a race in nfs42_proc_deallocate() Trond Myklebust 2016-07-06 22:30 ` [PATCH v4 26/28] NFSv4.2: Fix writeback races in nfs4_copy_file_range Trond Myklebust 2016-07-06 22:30 ` [PATCH v4 27/28] NFSv4.2: llseek(SEEK_HOLE) and llseek(SEEK_DATA) don't require data sync Trond Myklebust 2016-07-06 22:30 ` [PATCH v4 28/28] NFS nfs_vm_page_mkwrite: Don't freeze me, Bro Trond Myklebust 2016-07-18 3:48 ` [PATCH v4 24/28] NFS: Getattr doesn't require data sync semantics Christoph Hellwig 2016-07-18 4:32 ` Trond Myklebust 2016-07-18 4:59 ` Trond Myklebust 2016-07-19 3:58 ` hch 2016-07-19 20:00 ` [PATCH v4 24/28] " Benjamin Coddington 2016-07-19 20:06 ` Trond Myklebust 2016-07-20 15:03 ` Benjamin Coddington 2016-07-21 8:22 ` hch 2016-07-21 8:32 ` Benjamin Coddington 2016-07-21 9:10 ` Benjamin Coddington 2016-07-21 9:52 ` Benjamin Coddington 2016-07-21 12:46 ` Trond Myklebust 2016-07-21 13:05 ` Benjamin Coddington 2016-07-21 13:20 ` Trond Myklebust 2016-07-21 14:00 ` Trond Myklebust 2016-07-21 14:02 ` Benjamin Coddington 2016-07-25 16:26 ` Benjamin Coddington 2016-07-25 16:39 ` Trond Myklebust 2016-07-25 18:26 ` Benjamin Coddington 2016-07-25 18:34 ` Trond Myklebust 2016-07-25 18:41 ` Benjamin Coddington 2016-07-26 16:32 ` Benjamin Coddington 2016-07-26 16:35 ` Trond Myklebust 2016-07-26 17:57 ` Benjamin Coddington 2016-07-26 18:07 ` Trond Myklebust 2016-07-27 11:55 ` Benjamin Coddington 2016-07-27 12:15 ` Trond Myklebust 2016-07-27 12:31 ` Trond Myklebust 2016-07-27 16:14 ` Benjamin Coddington 2016-07-27 18:05 ` Trond Myklebust 2016-07-28 9:47 ` Benjamin Coddington 2016-07-28 12:31 ` Trond Myklebust 2016-07-28 14:04 ` Trond Myklebust 2016-07-28 15:38 ` Benjamin Coddington 2016-07-28 15:39 ` Trond Myklebust 2016-07-28 15:33 ` Benjamin Coddington 2016-07-28 15:36 ` Trond Myklebust 2016-07-28 16:40 ` Benjamin Coddington 2016-07-28 16:41 ` Trond Myklebust 2016-07-19 20:09 ` Benjamin Coddington
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.