* [PATCH AUTOSEL 5.7 081/388] nfsd: Fix svc_xprt refcnt leak when setup callback client failed
[not found] <20200618010805.600873-1-sashal@kernel.org>
@ 2020-06-18 1:02 ` Sasha Levin
2020-06-18 1:06 ` [PATCH AUTOSEL 5.7 295/388] net: sunrpc: Fix off-by-one issues in 'rpc_ntop6' Sasha Levin
` (5 subsequent siblings)
6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2020-06-18 1:02 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Xiyu Yang, Xin Tan, J . Bruce Fields, Sasha Levin, linux-nfs
From: Xiyu Yang <xiyuyang19@fudan.edu.cn>
[ Upstream commit a4abc6b12eb1f7a533c2e7484cfa555454ff0977 ]
nfsd4_process_cb_update() invokes svc_xprt_get(), which increases the
refcount of the "c->cn_xprt".
The reference counting issue happens in one exception handling path of
nfsd4_process_cb_update(). When setup callback client failed, the
function forgets to decrease the refcnt increased by svc_xprt_get(),
causing a refcnt leak.
Fix this issue by calling svc_xprt_put() when setup callback client
failed.
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/nfsd/nfs4callback.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 5cf91322de0f..07e0c6f6322f 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1301,6 +1301,8 @@ static void nfsd4_process_cb_update(struct nfsd4_callback *cb)
err = setup_callback_client(clp, &conn, ses);
if (err) {
nfsd4_mark_cb_down(clp, err);
+ if (c)
+ svc_xprt_put(c->cn_xprt);
return;
}
}
--
2.25.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH AUTOSEL 5.7 295/388] net: sunrpc: Fix off-by-one issues in 'rpc_ntop6'
[not found] <20200618010805.600873-1-sashal@kernel.org>
2020-06-18 1:02 ` [PATCH AUTOSEL 5.7 081/388] nfsd: Fix svc_xprt refcnt leak when setup callback client failed Sasha Levin
@ 2020-06-18 1:06 ` Sasha Levin
2020-06-18 1:06 ` [PATCH AUTOSEL 5.7 296/388] NFSv4.1 fix rpc_call_done assignment for BIND_CONN_TO_SESSION Sasha Levin
` (4 subsequent siblings)
6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2020-06-18 1:06 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Fedor Tokarev, Anna Schumaker, Sasha Levin, linux-nfs, netdev
From: Fedor Tokarev <ftokarev@gmail.com>
[ Upstream commit 118917d696dc59fd3e1741012c2f9db2294bed6f ]
Fix off-by-one issues in 'rpc_ntop6':
- 'snprintf' returns the number of characters which would have been
written if enough space had been available, excluding the terminating
null byte. Thus, a return value of 'sizeof(scopebuf)' means that the
last character was dropped.
- 'strcat' adds a terminating null byte to the string, thus if len ==
buflen, the null byte is written past the end of the buffer.
Signed-off-by: Fedor Tokarev <ftokarev@gmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/sunrpc/addr.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/sunrpc/addr.c b/net/sunrpc/addr.c
index 8b4d72b1a066..010dcb876f9d 100644
--- a/net/sunrpc/addr.c
+++ b/net/sunrpc/addr.c
@@ -82,11 +82,11 @@ static size_t rpc_ntop6(const struct sockaddr *sap,
rc = snprintf(scopebuf, sizeof(scopebuf), "%c%u",
IPV6_SCOPE_DELIMITER, sin6->sin6_scope_id);
- if (unlikely((size_t)rc > sizeof(scopebuf)))
+ if (unlikely((size_t)rc >= sizeof(scopebuf)))
return 0;
len += rc;
- if (unlikely(len > buflen))
+ if (unlikely(len >= buflen))
return 0;
strcat(buf, scopebuf);
--
2.25.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH AUTOSEL 5.7 296/388] NFSv4.1 fix rpc_call_done assignment for BIND_CONN_TO_SESSION
[not found] <20200618010805.600873-1-sashal@kernel.org>
2020-06-18 1:02 ` [PATCH AUTOSEL 5.7 081/388] nfsd: Fix svc_xprt refcnt leak when setup callback client failed Sasha Levin
2020-06-18 1:06 ` [PATCH AUTOSEL 5.7 295/388] net: sunrpc: Fix off-by-one issues in 'rpc_ntop6' Sasha Levin
@ 2020-06-18 1:06 ` Sasha Levin
2020-06-18 1:06 ` [PATCH AUTOSEL 5.7 316/388] nfsd4: make drc_slab global, not per-net Sasha Levin
` (3 subsequent siblings)
6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2020-06-18 1:06 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Olga Kornievskaia, Olga Kornievskaia, Anna Schumaker,
Sasha Levin, linux-nfs
From: Olga Kornievskaia <olga.kornievskaia@gmail.com>
[ Upstream commit 1c709b766e73e54d64b1dde1b7cfbcf25bcb15b9 ]
Fixes: 02a95dee8cf0 ("NFS add callback_ops to nfs4_proc_bind_conn_to_session_callback")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/nfs/nfs4proc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 9056f3dd380e..e32717fd1169 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7909,7 +7909,7 @@ nfs4_bind_one_conn_to_session_done(struct rpc_task *task, void *calldata)
}
static const struct rpc_call_ops nfs4_bind_one_conn_to_session_ops = {
- .rpc_call_done = &nfs4_bind_one_conn_to_session_done,
+ .rpc_call_done = nfs4_bind_one_conn_to_session_done,
};
/*
--
2.25.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH AUTOSEL 5.7 316/388] nfsd4: make drc_slab global, not per-net
[not found] <20200618010805.600873-1-sashal@kernel.org>
` (2 preceding siblings ...)
2020-06-18 1:06 ` [PATCH AUTOSEL 5.7 296/388] NFSv4.1 fix rpc_call_done assignment for BIND_CONN_TO_SESSION Sasha Levin
@ 2020-06-18 1:06 ` Sasha Levin
2020-06-18 1:07 ` [PATCH AUTOSEL 5.7 327/388] nfsd: safer handling of corrupted c_type Sasha Levin
` (2 subsequent siblings)
6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2020-06-18 1:06 UTC (permalink / raw)
To: linux-kernel, stable
Cc: J. Bruce Fields, kernel test robot, Sasha Levin, linux-nfs
From: "J. Bruce Fields" <bfields@redhat.com>
[ Upstream commit 027690c75e8fd91b60a634d31c4891a6e39d45bd ]
I made every global per-network-namespace instead. But perhaps doing
that to this slab was a step too far.
The kmem_cache_create call in our net init method also seems to be
responsible for this lockdep warning:
[ 45.163710] Unable to find swap-space signature
[ 45.375718] trinity-c1 (855): attempted to duplicate a private mapping with mremap. This is not supported.
[ 46.055744] futex_wake_op: trinity-c1 tries to shift op by -209; fix this program
[ 51.011723]
[ 51.013378] ======================================================
[ 51.013875] WARNING: possible circular locking dependency detected
[ 51.014378] 5.2.0-rc2 #1 Not tainted
[ 51.014672] ------------------------------------------------------
[ 51.015182] trinity-c2/886 is trying to acquire lock:
[ 51.015593] 000000005405f099 (slab_mutex){+.+.}, at: slab_attr_store+0xa2/0x130
[ 51.016190]
[ 51.016190] but task is already holding lock:
[ 51.016652] 00000000ac662005 (kn->count#43){++++}, at: kernfs_fop_write+0x286/0x500
[ 51.017266]
[ 51.017266] which lock already depends on the new lock.
[ 51.017266]
[ 51.017909]
[ 51.017909] the existing dependency chain (in reverse order) is:
[ 51.018497]
[ 51.018497] -> #1 (kn->count#43){++++}:
[ 51.018956] __lock_acquire+0x7cf/0x1a20
[ 51.019317] lock_acquire+0x17d/0x390
[ 51.019658] __kernfs_remove+0x892/0xae0
[ 51.020020] kernfs_remove_by_name_ns+0x78/0x110
[ 51.020435] sysfs_remove_link+0x55/0xb0
[ 51.020832] sysfs_slab_add+0xc1/0x3e0
[ 51.021332] __kmem_cache_create+0x155/0x200
[ 51.021720] create_cache+0xf5/0x320
[ 51.022054] kmem_cache_create_usercopy+0x179/0x320
[ 51.022486] kmem_cache_create+0x1a/0x30
[ 51.022867] nfsd_reply_cache_init+0x278/0x560
[ 51.023266] nfsd_init_net+0x20f/0x5e0
[ 51.023623] ops_init+0xcb/0x4b0
[ 51.023928] setup_net+0x2fe/0x670
[ 51.024315] copy_net_ns+0x30a/0x3f0
[ 51.024653] create_new_namespaces+0x3c5/0x820
[ 51.025257] unshare_nsproxy_namespaces+0xd1/0x240
[ 51.025881] ksys_unshare+0x506/0x9c0
[ 51.026381] __x64_sys_unshare+0x3a/0x50
[ 51.026937] do_syscall_64+0x110/0x10b0
[ 51.027509] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 51.028175]
[ 51.028175] -> #0 (slab_mutex){+.+.}:
[ 51.028817] validate_chain+0x1c51/0x2cc0
[ 51.029422] __lock_acquire+0x7cf/0x1a20
[ 51.029947] lock_acquire+0x17d/0x390
[ 51.030438] __mutex_lock+0x100/0xfa0
[ 51.030995] mutex_lock_nested+0x27/0x30
[ 51.031516] slab_attr_store+0xa2/0x130
[ 51.032020] sysfs_kf_write+0x11d/0x180
[ 51.032529] kernfs_fop_write+0x32a/0x500
[ 51.033056] do_loop_readv_writev+0x21d/0x310
[ 51.033627] do_iter_write+0x2e5/0x380
[ 51.034148] vfs_writev+0x170/0x310
[ 51.034616] do_pwritev+0x13e/0x160
[ 51.035100] __x64_sys_pwritev+0xa3/0x110
[ 51.035633] do_syscall_64+0x110/0x10b0
[ 51.036200] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 51.036924]
[ 51.036924] other info that might help us debug this:
[ 51.036924]
[ 51.037876] Possible unsafe locking scenario:
[ 51.037876]
[ 51.038556] CPU0 CPU1
[ 51.039130] ---- ----
[ 51.039676] lock(kn->count#43);
[ 51.040084] lock(slab_mutex);
[ 51.040597] lock(kn->count#43);
[ 51.041062] lock(slab_mutex);
[ 51.041320]
[ 51.041320] *** DEADLOCK ***
[ 51.041320]
[ 51.041793] 3 locks held by trinity-c2/886:
[ 51.042128] #0: 000000001f55e152 (sb_writers#5){.+.+}, at: vfs_writev+0x2b9/0x310
[ 51.042739] #1: 00000000c7d6c034 (&of->mutex){+.+.}, at: kernfs_fop_write+0x25b/0x500
[ 51.043400] #2: 00000000ac662005 (kn->count#43){++++}, at: kernfs_fop_write+0x286/0x500
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 3ba75830ce17 "drc containerization"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/nfsd/cache.h | 2 ++
fs/nfsd/netns.h | 1 -
fs/nfsd/nfscache.c | 29 +++++++++++++++++------------
fs/nfsd/nfsctl.c | 6 ++++++
4 files changed, 25 insertions(+), 13 deletions(-)
diff --git a/fs/nfsd/cache.h b/fs/nfsd/cache.h
index 10ec5ecdf117..65c331f75e9c 100644
--- a/fs/nfsd/cache.h
+++ b/fs/nfsd/cache.h
@@ -78,6 +78,8 @@ enum {
/* Checksum this amount of the request */
#define RC_CSUMLEN (256U)
+int nfsd_drc_slab_create(void);
+void nfsd_drc_slab_free(void);
int nfsd_reply_cache_init(struct nfsd_net *);
void nfsd_reply_cache_shutdown(struct nfsd_net *);
int nfsd_cache_lookup(struct svc_rqst *);
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 09aa545825bd..9217cb64bf0e 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -139,7 +139,6 @@ struct nfsd_net {
* Duplicate reply cache
*/
struct nfsd_drc_bucket *drc_hashtbl;
- struct kmem_cache *drc_slab;
/* max number of entries allowed in the cache */
unsigned int max_drc_entries;
diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index 96352ab7bd81..0c10bfea039e 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -36,6 +36,8 @@ struct nfsd_drc_bucket {
spinlock_t cache_lock;
};
+static struct kmem_cache *drc_slab;
+
static int nfsd_cache_append(struct svc_rqst *rqstp, struct kvec *vec);
static unsigned long nfsd_reply_cache_count(struct shrinker *shrink,
struct shrink_control *sc);
@@ -95,7 +97,7 @@ nfsd_reply_cache_alloc(struct svc_rqst *rqstp, __wsum csum,
{
struct svc_cacherep *rp;
- rp = kmem_cache_alloc(nn->drc_slab, GFP_KERNEL);
+ rp = kmem_cache_alloc(drc_slab, GFP_KERNEL);
if (rp) {
rp->c_state = RC_UNUSED;
rp->c_type = RC_NOCACHE;
@@ -129,7 +131,7 @@ nfsd_reply_cache_free_locked(struct nfsd_drc_bucket *b, struct svc_cacherep *rp,
atomic_dec(&nn->num_drc_entries);
nn->drc_mem_usage -= sizeof(*rp);
}
- kmem_cache_free(nn->drc_slab, rp);
+ kmem_cache_free(drc_slab, rp);
}
static void
@@ -141,6 +143,18 @@ nfsd_reply_cache_free(struct nfsd_drc_bucket *b, struct svc_cacherep *rp,
spin_unlock(&b->cache_lock);
}
+int nfsd_drc_slab_create(void)
+{
+ drc_slab = kmem_cache_create("nfsd_drc",
+ sizeof(struct svc_cacherep), 0, 0, NULL);
+ return drc_slab ? 0: -ENOMEM;
+}
+
+void nfsd_drc_slab_free(void)
+{
+ kmem_cache_destroy(drc_slab);
+}
+
int nfsd_reply_cache_init(struct nfsd_net *nn)
{
unsigned int hashsize;
@@ -159,18 +173,13 @@ int nfsd_reply_cache_init(struct nfsd_net *nn)
if (status)
goto out_nomem;
- nn->drc_slab = kmem_cache_create("nfsd_drc",
- sizeof(struct svc_cacherep), 0, 0, NULL);
- if (!nn->drc_slab)
- goto out_shrinker;
-
nn->drc_hashtbl = kcalloc(hashsize,
sizeof(*nn->drc_hashtbl), GFP_KERNEL);
if (!nn->drc_hashtbl) {
nn->drc_hashtbl = vzalloc(array_size(hashsize,
sizeof(*nn->drc_hashtbl)));
if (!nn->drc_hashtbl)
- goto out_slab;
+ goto out_shrinker;
}
for (i = 0; i < hashsize; i++) {
@@ -180,8 +189,6 @@ int nfsd_reply_cache_init(struct nfsd_net *nn)
nn->drc_hashsize = hashsize;
return 0;
-out_slab:
- kmem_cache_destroy(nn->drc_slab);
out_shrinker:
unregister_shrinker(&nn->nfsd_reply_cache_shrinker);
out_nomem:
@@ -209,8 +216,6 @@ void nfsd_reply_cache_shutdown(struct nfsd_net *nn)
nn->drc_hashtbl = NULL;
nn->drc_hashsize = 0;
- kmem_cache_destroy(nn->drc_slab);
- nn->drc_slab = NULL;
}
/*
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 3bb2db947d29..71687d99b090 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1533,6 +1533,9 @@ static int __init init_nfsd(void)
goto out_free_slabs;
nfsd_fault_inject_init(); /* nfsd fault injection controls */
nfsd_stat_init(); /* Statistics */
+ retval = nfsd_drc_slab_create();
+ if (retval)
+ goto out_free_stat;
nfsd_lockd_init(); /* lockd->nfsd callbacks */
retval = create_proc_exports_entry();
if (retval)
@@ -1546,6 +1549,8 @@ static int __init init_nfsd(void)
remove_proc_entry("fs/nfs", NULL);
out_free_lockd:
nfsd_lockd_shutdown();
+ nfsd_drc_slab_free();
+out_free_stat:
nfsd_stat_shutdown();
nfsd_fault_inject_cleanup();
nfsd4_exit_pnfs();
@@ -1560,6 +1565,7 @@ static int __init init_nfsd(void)
static void __exit exit_nfsd(void)
{
+ nfsd_drc_slab_free();
remove_proc_entry("fs/nfs/exports", NULL);
remove_proc_entry("fs/nfs", NULL);
nfsd_stat_shutdown();
--
2.25.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH AUTOSEL 5.7 327/388] nfsd: safer handling of corrupted c_type
[not found] <20200618010805.600873-1-sashal@kernel.org>
` (3 preceding siblings ...)
2020-06-18 1:06 ` [PATCH AUTOSEL 5.7 316/388] nfsd4: make drc_slab global, not per-net Sasha Levin
@ 2020-06-18 1:07 ` Sasha Levin
2020-06-18 1:07 ` [PATCH AUTOSEL 5.7 380/388] nfs: set invalid blocks after NFSv4 writes Sasha Levin
2020-06-18 1:07 ` [PATCH AUTOSEL 5.7 381/388] NFS: Fix direct WRITE throughput regression Sasha Levin
6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2020-06-18 1:07 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: J. Bruce Fields, Sasha Levin, linux-nfs
From: "J. Bruce Fields" <bfields@redhat.com>
[ Upstream commit c25bf185e57213b54ea0d632ac04907310993433 ]
This can only happen if there's a bug somewhere, so let's make it a WARN
not a printk. Also, I think it's safest to ignore the corruption rather
than trying to fix it by removing a cache entry.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/nfsd/nfscache.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index 0c10bfea039e..4a258065188e 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -469,8 +469,7 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
rtn = RC_REPLY;
break;
default:
- printk(KERN_WARNING "nfsd: bad repcache type %d\n", rp->c_type);
- nfsd_reply_cache_free_locked(b, rp, nn);
+ WARN_ONCE(1, "nfsd: bad repcache type %d\n", rp->c_type);
}
goto out;
--
2.25.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH AUTOSEL 5.7 380/388] nfs: set invalid blocks after NFSv4 writes
[not found] <20200618010805.600873-1-sashal@kernel.org>
` (4 preceding siblings ...)
2020-06-18 1:07 ` [PATCH AUTOSEL 5.7 327/388] nfsd: safer handling of corrupted c_type Sasha Levin
@ 2020-06-18 1:07 ` Sasha Levin
2020-06-18 1:07 ` [PATCH AUTOSEL 5.7 381/388] NFS: Fix direct WRITE throughput regression Sasha Levin
6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2020-06-18 1:07 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Zheng Bin, Anna Schumaker, Sasha Levin, linux-nfs
From: Zheng Bin <zhengbin13@huawei.com>
[ Upstream commit 3a39e778690500066b31fe982d18e2e394d3bce2 ]
Use the following command to test nfsv4(size of file1M is 1MB):
mount -t nfs -o vers=4.0,actimeo=60 127.0.0.1/dir1 /mnt
cp file1M /mnt
du -h /mnt/file1M -->0 within 60s, then 1M
When write is done(cp file1M /mnt), will call this:
nfs_writeback_done
nfs4_write_done
nfs4_write_done_cb
nfs_writeback_update_inode
nfs_post_op_update_inode_force_wcc_locked(change, ctime, mtime
nfs_post_op_update_inode_force_wcc_locked
nfs_set_cache_invalid
nfs_refresh_inode_locked
nfs_update_inode
nfsd write response contains change, ctime, mtime, the flag will be
clear after nfs_update_inode. Howerver, write response does not contain
space_used, previous open response contains space_used whose value is 0,
so inode->i_blocks is still 0.
nfs_getattr -->called by "du -h"
do_update |= force_sync || nfs_attribute_cache_expired -->false in 60s
cache_validity = READ_ONCE(NFS_I(inode)->cache_validity)
do_update |= cache_validity & (NFS_INO_INVALID_ATTR -->false
if (do_update) {
__nfs_revalidate_inode
}
Within 60s, does not send getattr request to nfsd, thus "du -h /mnt/file1M"
is 0.
Add a NFS_INO_INVALID_BLOCKS flag, set it when nfsv4 write is done.
Fixes: 16e143751727 ("NFS: More fine grained attribute tracking")
Signed-off-by: Zheng Bin <zhengbin13@huawei.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/nfs/inode.c | 14 +++++++++++---
include/linux/nfs_fs.h | 1 +
2 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index b9d0921cb4fe..0bf1f835de01 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -833,6 +833,8 @@ int nfs_getattr(const struct path *path, struct kstat *stat,
do_update |= cache_validity & NFS_INO_INVALID_ATIME;
if (request_mask & (STATX_CTIME|STATX_MTIME))
do_update |= cache_validity & NFS_INO_REVAL_PAGECACHE;
+ if (request_mask & STATX_BLOCKS)
+ do_update |= cache_validity & NFS_INO_INVALID_BLOCKS;
if (do_update) {
/* Update the attribute cache */
if (!(server->flags & NFS_MOUNT_NOAC))
@@ -1764,7 +1766,8 @@ int nfs_post_op_update_inode_force_wcc_locked(struct inode *inode, struct nfs_fa
status = nfs_post_op_update_inode_locked(inode, fattr,
NFS_INO_INVALID_CHANGE
| NFS_INO_INVALID_CTIME
- | NFS_INO_INVALID_MTIME);
+ | NFS_INO_INVALID_MTIME
+ | NFS_INO_INVALID_BLOCKS);
return status;
}
@@ -1871,7 +1874,8 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
nfsi->cache_validity &= ~(NFS_INO_INVALID_ATTR
| NFS_INO_INVALID_ATIME
| NFS_INO_REVAL_FORCED
- | NFS_INO_REVAL_PAGECACHE);
+ | NFS_INO_REVAL_PAGECACHE
+ | NFS_INO_INVALID_BLOCKS);
/* Do atomic weak cache consistency updates */
nfs_wcc_update_inode(inode, fattr);
@@ -2033,8 +2037,12 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
inode->i_blocks = nfs_calc_block_size(fattr->du.nfs3.used);
} else if (fattr->valid & NFS_ATTR_FATTR_BLOCKS_USED)
inode->i_blocks = fattr->du.nfs2.blocks;
- else
+ else {
+ nfsi->cache_validity |= save_cache_validity &
+ (NFS_INO_INVALID_BLOCKS
+ | NFS_INO_REVAL_FORCED);
cache_revalidated = false;
+ }
/* Update attrtimeo value if we're out of the unstable period */
if (attr_changed) {
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 73eda45f1cfd..6ee9119acc5d 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -230,6 +230,7 @@ struct nfs4_copy_state {
#define NFS_INO_INVALID_OTHER BIT(12) /* other attrs are invalid */
#define NFS_INO_DATA_INVAL_DEFER \
BIT(13) /* Deferred cache invalidation */
+#define NFS_INO_INVALID_BLOCKS BIT(14) /* cached blocks are invalid */
#define NFS_INO_INVALID_ATTR (NFS_INO_INVALID_CHANGE \
| NFS_INO_INVALID_CTIME \
--
2.25.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH AUTOSEL 5.7 381/388] NFS: Fix direct WRITE throughput regression
[not found] <20200618010805.600873-1-sashal@kernel.org>
` (5 preceding siblings ...)
2020-06-18 1:07 ` [PATCH AUTOSEL 5.7 380/388] nfs: set invalid blocks after NFSv4 writes Sasha Levin
@ 2020-06-18 1:07 ` Sasha Levin
6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2020-06-18 1:07 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Chuck Lever, Anna Schumaker, Sasha Levin, linux-nfs
From: Chuck Lever <chuck.lever@oracle.com>
[ Upstream commit ba838a75e73f55a780f1ee896b8e3ecb032dba0f ]
I measured a 50% throughput regression for large direct writes.
The observed on-the-wire behavior is that the client sends every
NFS WRITE twice: once as an UNSTABLE WRITE plus a COMMIT, and once
as a FILE_SYNC WRITE.
This is because the nfs_write_match_verf() check in
nfs_direct_commit_complete() fails for every WRITE.
Buffered writes use nfs_write_completion(), which sets req->wb_verf
correctly. Direct writes use nfs_direct_write_completion(), which
does not set req->wb_verf at all. This leaves req->wb_verf set to
all zeroes for every direct WRITE, and thus
nfs_direct_commit_completion() always sets NFS_ODIRECT_RESCHED_WRITES.
This fix appears to restore nearly all of the lost performance.
Fixes: 1f28476dcb98 ("NFS: Fix O_DIRECT commit verifier handling")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/nfs/direct.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index a57e7c72c7f4..d49b1d197908 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -731,6 +731,8 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr)
nfs_list_remove_request(req);
if (request_commit) {
kref_get(&req->wb_kref);
+ memcpy(&req->wb_verf, &hdr->verf.verifier,
+ sizeof(req->wb_verf));
nfs_mark_request_commit(req, hdr->lseg, &cinfo,
hdr->ds_commit_idx);
}
--
2.25.1
^ permalink raw reply related [flat|nested] 7+ messages in thread