All of lore.kernel.org
 help / color / mirror / Atom feed
* NFS regression between 5.17 and 5.18
@ 2022-04-28 13:05 Dennis Dalessandro
  2022-04-28 14:57 ` Chuck Lever III
  2022-05-04  9:45 ` Thorsten Leemhuis
  0 siblings, 2 replies; 25+ messages in thread
From: Dennis Dalessandro @ 2022-04-28 13:05 UTC (permalink / raw)
  To: linux-nfs; +Cc: Chuck Lever

Hi NFS folks,

I've noticed a pretty nasty regression in our NFS capability between 5.17 and
5.18-rc1. I've tried to bisect but not having any luck. The problem I'm seeing
is it takes 3 minutes to copy a file from NFS to the local disk. When it should
take less than half a second, which it did up through 5.17.

It doesn't seem to be network related, but can't rule that out completely.

I tried to bisect but the problem can be intermittent. Some runs I'll see a
problem in 3 out of 100 cycles, sometimes 0 out of 100. Sometimes I'll see it
100 out of 100.

The latest attempt I have is at 5.18-rc4 and here's what I see when I 1) create
file with dd, 2) copy local disk to NFS mount, 3) copy NFS to local disk.

Test 2
Creating /dev/shm/run_nfs_test.junk...
262144+0 records in
262144+0 records out
268435456 bytes (268 MB, 256 MiB) copied, 1.29037 s, 208 MB/s
Done
copy /dev/shm/run_nfs_test.junk to /mnt/nfs_test/run_nfs_test.junk...

real    0m0.502s
user    0m0.001s
sys     0m0.250s
Done
copy /mnt/nfs_test/run_nfs_test.junk to /dev/shm/run_nfs_test.tmp...

real    4m11.835s
user    0m0.001s
sys     0m0.277s
Done
Comparing files...
Done
Remove /dev/shm/run_nfs_test.tmp...

real    0m0.031s
user    0m0.000s
sys     0m0.031s
Done
Remove /dev/shm/run_nfs_test.junk...

real    0m0.031s
user    0m0.001s
sys     0m0.030s
Done
Remove /mnt/nfs_test/run_nfs_test.junk...

real    0m0.024s
user    0m0.001s
sys     0m0.022s
Done

Create the server with a 4G RAM disk:
mount -t tmpfs -o size=4096M tmpfs /mnt/nfs_test

exportfs -o fsid=0,rw,async,insecure,no_root_squash
192.168.254.0/255.255.255.0:/mnt/nfs_test

Mount on the client side with:
mount -o rdma,port=20049 192.168.254.1:/mnt/nfs_test /mnt/nfs_test

Any advice on tracking this down further would be greatly appreciated.

Thanks

-Denny


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NFS regression between 5.17 and 5.18
  2022-04-28 13:05 NFS regression between 5.17 and 5.18 Dennis Dalessandro
@ 2022-04-28 14:57 ` Chuck Lever III
  2022-04-28 15:42   ` Dennis Dalessandro
  2022-05-04  9:45 ` Thorsten Leemhuis
  1 sibling, 1 reply; 25+ messages in thread
From: Chuck Lever III @ 2022-04-28 14:57 UTC (permalink / raw)
  To: Dennis Dalessandro; +Cc: Linux NFS Mailing List



> On Apr 28, 2022, at 9:05 AM, Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> wrote:
> 
> Hi NFS folks,
> 
> I've noticed a pretty nasty regression in our NFS capability between 5.17 and
> 5.18-rc1. I've tried to bisect but not having any luck. The problem I'm seeing
> is it takes 3 minutes to copy a file from NFS to the local disk. When it should
> take less than half a second, which it did up through 5.17.
> 
> It doesn't seem to be network related, but can't rule that out completely.
> 
> I tried to bisect but the problem can be intermittent. Some runs I'll see a
> problem in 3 out of 100 cycles, sometimes 0 out of 100. Sometimes I'll see it
> 100 out of 100.

It's not clear from your problem report whether the problem appears
when it's the server running v5.18-rc or the client.

It looks vaguely like a recent client issue where it encounters a
memory shortage while preparing an RPC and a delay is needed.


> The latest attempt I have is at 5.18-rc4 and here's what I see when I 1) create
> file with dd, 2) copy local disk to NFS mount, 3) copy NFS to local disk.
> 
> Test 2
> Creating /dev/shm/run_nfs_test.junk...
> 262144+0 records in
> 262144+0 records out
> 268435456 bytes (268 MB, 256 MiB) copied, 1.29037 s, 208 MB/s
> Done
> copy /dev/shm/run_nfs_test.junk to /mnt/nfs_test/run_nfs_test.junk...
> 
> real    0m0.502s
> user    0m0.001s
> sys     0m0.250s
> Done
> copy /mnt/nfs_test/run_nfs_test.junk to /dev/shm/run_nfs_test.tmp...
> 
> real    4m11.835s
> user    0m0.001s
> sys     0m0.277s
> Done
> Comparing files...
> Done
> Remove /dev/shm/run_nfs_test.tmp...
> 
> real    0m0.031s
> user    0m0.000s
> sys     0m0.031s
> Done
> Remove /dev/shm/run_nfs_test.junk...
> 
> real    0m0.031s
> user    0m0.001s
> sys     0m0.030s
> Done
> Remove /mnt/nfs_test/run_nfs_test.junk...
> 
> real    0m0.024s
> user    0m0.001s
> sys     0m0.022s
> Done
> 
> Create the server with a 4G RAM disk:
> mount -t tmpfs -o size=4096M tmpfs /mnt/nfs_test
> 
> exportfs -o fsid=0,rw,async,insecure,no_root_squash
> 192.168.254.0/255.255.255.0:/mnt/nfs_test
> 
> Mount on the client side with:
> mount -o rdma,port=20049 192.168.254.1:/mnt/nfs_test /mnt/nfs_test
> 
> Any advice on tracking this down further would be greatly appreciated.
> 
> Thanks
> 
> -Denny
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NFS regression between 5.17 and 5.18
  2022-04-28 14:57 ` Chuck Lever III
@ 2022-04-28 15:42   ` Dennis Dalessandro
  2022-04-28 19:47     ` Dennis Dalessandro
  0 siblings, 1 reply; 25+ messages in thread
From: Dennis Dalessandro @ 2022-04-28 15:42 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Linux NFS Mailing List

On 4/28/22 10:57 AM, Chuck Lever III wrote:
>> On Apr 28, 2022, at 9:05 AM, Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> wrote:
>>
>> Hi NFS folks,
>>
>> I've noticed a pretty nasty regression in our NFS capability between 5.17 and
>> 5.18-rc1. I've tried to bisect but not having any luck. The problem I'm seeing
>> is it takes 3 minutes to copy a file from NFS to the local disk. When it should
>> take less than half a second, which it did up through 5.17.
>>
>> It doesn't seem to be network related, but can't rule that out completely.
>>
>> I tried to bisect but the problem can be intermittent. Some runs I'll see a
>> problem in 3 out of 100 cycles, sometimes 0 out of 100. Sometimes I'll see it
>> 100 out of 100.
> 
> It's not clear from your problem report whether the problem appears
> when it's the server running v5.18-rc or the client.

That's because I don't know which it is. I'll do a quick test and find out. I
was testing the same kernel across both nodes.

-Denny


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NFS regression between 5.17 and 5.18
  2022-04-28 15:42   ` Dennis Dalessandro
@ 2022-04-28 19:47     ` Dennis Dalessandro
  2022-04-28 19:56       ` Trond Myklebust
  0 siblings, 1 reply; 25+ messages in thread
From: Dennis Dalessandro @ 2022-04-28 19:47 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Linux NFS Mailing List

On 4/28/22 11:42 AM, Dennis Dalessandro wrote:
> On 4/28/22 10:57 AM, Chuck Lever III wrote:
>>> On Apr 28, 2022, at 9:05 AM, Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> wrote:
>>>
>>> Hi NFS folks,
>>>
>>> I've noticed a pretty nasty regression in our NFS capability between 5.17 and
>>> 5.18-rc1. I've tried to bisect but not having any luck. The problem I'm seeing
>>> is it takes 3 minutes to copy a file from NFS to the local disk. When it should
>>> take less than half a second, which it did up through 5.17.
>>>
>>> It doesn't seem to be network related, but can't rule that out completely.
>>>
>>> I tried to bisect but the problem can be intermittent. Some runs I'll see a
>>> problem in 3 out of 100 cycles, sometimes 0 out of 100. Sometimes I'll see it
>>> 100 out of 100.
>>
>> It's not clear from your problem report whether the problem appears
>> when it's the server running v5.18-rc or the client.
> 
> That's because I don't know which it is. I'll do a quick test and find out. I
> was testing the same kernel across both nodes.

Looks like it is the client.

server  client  result
------  ------  ------
5.17    5.17    Pass
5.17    5.18    Fail
5.18    5.18    Fail
5.18    5.17    Pass

Is there a patch for the client issue you mentioned that I could try?

-Denny

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NFS regression between 5.17 and 5.18
  2022-04-28 19:47     ` Dennis Dalessandro
@ 2022-04-28 19:56       ` Trond Myklebust
  2022-04-29 12:54         ` Dennis Dalessandro
  0 siblings, 1 reply; 25+ messages in thread
From: Trond Myklebust @ 2022-04-28 19:56 UTC (permalink / raw)
  To: dennis.dalessandro, chuck.lever; +Cc: linux-nfs

[-- Attachment #1: Type: text/plain, Size: 1731 bytes --]

On Thu, 2022-04-28 at 15:47 -0400, Dennis Dalessandro wrote:
> On 4/28/22 11:42 AM, Dennis Dalessandro wrote:
> > On 4/28/22 10:57 AM, Chuck Lever III wrote:
> > > > On Apr 28, 2022, at 9:05 AM, Dennis Dalessandro
> > > > <dennis.dalessandro@cornelisnetworks.com> wrote:
> > > > 
> > > > Hi NFS folks,
> > > > 
> > > > I've noticed a pretty nasty regression in our NFS capability
> > > > between 5.17 and
> > > > 5.18-rc1. I've tried to bisect but not having any luck. The
> > > > problem I'm seeing
> > > > is it takes 3 minutes to copy a file from NFS to the local
> > > > disk. When it should
> > > > take less than half a second, which it did up through 5.17.
> > > > 
> > > > It doesn't seem to be network related, but can't rule that out
> > > > completely.
> > > > 
> > > > I tried to bisect but the problem can be intermittent. Some
> > > > runs I'll see a
> > > > problem in 3 out of 100 cycles, sometimes 0 out of 100.
> > > > Sometimes I'll see it
> > > > 100 out of 100.
> > > 
> > > It's not clear from your problem report whether the problem
> > > appears
> > > when it's the server running v5.18-rc or the client.
> > 
> > That's because I don't know which it is. I'll do a quick test and
> > find out. I
> > was testing the same kernel across both nodes.
> 
> Looks like it is the client.
> 
> server  client  result
> ------  ------  ------
> 5.17    5.17    Pass
> 5.17    5.18    Fail
> 5.18    5.18    Fail
> 5.18    5.17    Pass
> 
> Is there a patch for the client issue you mentioned that I could try?
> 
> -Denny

Try this one


-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-SUNRPC-Fix-an-RPC-RDMA-performance-regression.patch --]
[-- Type: text/x-patch; name="0001-SUNRPC-Fix-an-RPC-RDMA-performance-regression.patch", Size: 2444 bytes --]

From 0505ecf76f5131525e696d3dae3bd456a8b8b3ad Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust@hammerspace.com>
Date: Thu, 28 Apr 2022 15:46:01 -0400
Subject: [PATCH] SUNRPC: Fix an RPC/RDMA performance regression

Use the standard gfp mask instead of using GFP_NOWAIT. The latter causes
issues when under memory pressure.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 net/sunrpc/auth_gss/auth_gss.c  | 11 ++++-------
 net/sunrpc/sched.c              |  1 +
 net/sunrpc/xprtrdma/transport.c |  6 +-----
 3 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index de7e5b41ab8f..a31a27816cc0 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -1340,14 +1340,11 @@ gss_hash_cred(struct auth_cred *acred, unsigned int hashbits)
 /*
  * Lookup RPCSEC_GSS cred for the current process
  */
-static struct rpc_cred *
-gss_lookup_cred(struct rpc_auth *auth, struct auth_cred *acred, int flags)
+static struct rpc_cred *gss_lookup_cred(struct rpc_auth *auth,
+					struct auth_cred *acred, int flags)
 {
-	gfp_t gfp = GFP_KERNEL;
-
-	if (flags & RPCAUTH_LOOKUP_ASYNC)
-		gfp = GFP_NOWAIT | __GFP_NOWARN;
-	return rpcauth_lookup_credcache(auth, acred, flags, gfp);
+	return rpcauth_lookup_credcache(auth, acred, flags,
+					rpc_task_gfp_mask());
 }
 
 static struct rpc_cred *
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 7f70c1e608b7..25b9221950ff 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -63,6 +63,7 @@ gfp_t rpc_task_gfp_mask(void)
 		return GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN;
 	return GFP_KERNEL;
 }
+EXPORT_SYMBOL_GPL(rpc_task_gfp_mask);
 
 unsigned long
 rpc_task_timeout(const struct rpc_task *task)
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 6b7e10e5a141..bcb37b51adf6 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -571,11 +571,7 @@ xprt_rdma_allocate(struct rpc_task *task)
 	struct rpc_rqst *rqst = task->tk_rqstp;
 	struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(rqst->rq_xprt);
 	struct rpcrdma_req *req = rpcr_to_rdmar(rqst);
-	gfp_t flags;
-
-	flags = RPCRDMA_DEF_GFP;
-	if (RPC_IS_ASYNC(task))
-		flags = GFP_NOWAIT | __GFP_NOWARN;
+	gfp_t flags = rpc_task_gfp_mask();
 
 	if (!rpcrdma_check_regbuf(r_xprt, req->rl_sendbuf, rqst->rq_callsize,
 				  flags))
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: NFS regression between 5.17 and 5.18
  2022-04-28 19:56       ` Trond Myklebust
@ 2022-04-29 12:54         ` Dennis Dalessandro
  2022-04-29 13:37           ` Chuck Lever III
  0 siblings, 1 reply; 25+ messages in thread
From: Dennis Dalessandro @ 2022-04-29 12:54 UTC (permalink / raw)
  To: Trond Myklebust, chuck.lever; +Cc: linux-nfs

On 4/28/22 3:56 PM, Trond Myklebust wrote:
> On Thu, 2022-04-28 at 15:47 -0400, Dennis Dalessandro wrote:
>> On 4/28/22 11:42 AM, Dennis Dalessandro wrote:
>>> On 4/28/22 10:57 AM, Chuck Lever III wrote:
>>>>> On Apr 28, 2022, at 9:05 AM, Dennis Dalessandro
>>>>> <dennis.dalessandro@cornelisnetworks.com> wrote:
>>>>>
>>>>> Hi NFS folks,
>>>>>
>>>>> I've noticed a pretty nasty regression in our NFS capability
>>>>> between 5.17 and
>>>>> 5.18-rc1. I've tried to bisect but not having any luck. The
>>>>> problem I'm seeing
>>>>> is it takes 3 minutes to copy a file from NFS to the local
>>>>> disk. When it should
>>>>> take less than half a second, which it did up through 5.17.
>>>>>
>>>>> It doesn't seem to be network related, but can't rule that out
>>>>> completely.
>>>>>
>>>>> I tried to bisect but the problem can be intermittent. Some
>>>>> runs I'll see a
>>>>> problem in 3 out of 100 cycles, sometimes 0 out of 100.
>>>>> Sometimes I'll see it
>>>>> 100 out of 100.
>>>>
>>>> It's not clear from your problem report whether the problem
>>>> appears
>>>> when it's the server running v5.18-rc or the client.
>>>
>>> That's because I don't know which it is. I'll do a quick test and
>>> find out. I
>>> was testing the same kernel across both nodes.
>>
>> Looks like it is the client.
>>
>> server  client  result
>> ------  ------  ------
>> 5.17    5.17    Pass
>> 5.17    5.18    Fail
>> 5.18    5.18    Fail
>> 5.18    5.17    Pass
>>
>> Is there a patch for the client issue you mentioned that I could try?
>>
>> -Denny
> 
> Try this one

Thanks for the patch. Unfortunately it doesn't seem to solve the issue, still
see intermittent hangs. I applied it on top of -rc4:

copy /mnt/nfs_test/run_nfs_test.junk to /dev/shm/run_nfs_test.tmp...

real    2m6.072s
user    0m0.002s
sys     0m0.263s
Done

While it was hung I checked the mem usage on the machine:

# free -h
              total        used        free      shared  buff/cache   available
Mem:           62Gi       871Mi        61Gi       342Mi       889Mi        61Gi
Swap:         4.0Gi          0B       4.0Gi

Doesn't appear to be under memory pressure.

-Denny

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NFS regression between 5.17 and 5.18
  2022-04-29 12:54         ` Dennis Dalessandro
@ 2022-04-29 13:37           ` Chuck Lever III
  2022-05-06 13:24             ` Dennis Dalessandro
  0 siblings, 1 reply; 25+ messages in thread
From: Chuck Lever III @ 2022-04-29 13:37 UTC (permalink / raw)
  To: Dennis Dalessandro; +Cc: Trond Myklebust, Linux NFS Mailing List



> On Apr 29, 2022, at 8:54 AM, Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> wrote:
> 
> On 4/28/22 3:56 PM, Trond Myklebust wrote:
>> On Thu, 2022-04-28 at 15:47 -0400, Dennis Dalessandro wrote:
>>> On 4/28/22 11:42 AM, Dennis Dalessandro wrote:
>>>> On 4/28/22 10:57 AM, Chuck Lever III wrote:
>>>>>> On Apr 28, 2022, at 9:05 AM, Dennis Dalessandro
>>>>>> <dennis.dalessandro@cornelisnetworks.com> wrote:
>>>>>> 
>>>>>> Hi NFS folks,
>>>>>> 
>>>>>> I've noticed a pretty nasty regression in our NFS capability
>>>>>> between 5.17 and
>>>>>> 5.18-rc1. I've tried to bisect but not having any luck. The
>>>>>> problem I'm seeing
>>>>>> is it takes 3 minutes to copy a file from NFS to the local
>>>>>> disk. When it should
>>>>>> take less than half a second, which it did up through 5.17.
>>>>>> 
>>>>>> It doesn't seem to be network related, but can't rule that out
>>>>>> completely.
>>>>>> 
>>>>>> I tried to bisect but the problem can be intermittent. Some
>>>>>> runs I'll see a
>>>>>> problem in 3 out of 100 cycles, sometimes 0 out of 100.
>>>>>> Sometimes I'll see it
>>>>>> 100 out of 100.
>>>>> 
>>>>> It's not clear from your problem report whether the problem
>>>>> appears
>>>>> when it's the server running v5.18-rc or the client.
>>>> 
>>>> That's because I don't know which it is. I'll do a quick test and
>>>> find out. I
>>>> was testing the same kernel across both nodes.
>>> 
>>> Looks like it is the client.
>>> 
>>> server  client  result
>>> ------  ------  ------
>>> 5.17    5.17    Pass
>>> 5.17    5.18    Fail
>>> 5.18    5.18    Fail
>>> 5.18    5.17    Pass
>>> 
>>> Is there a patch for the client issue you mentioned that I could try?
>>> 
>>> -Denny
>> 
>> Try this one
> 
> Thanks for the patch. Unfortunately it doesn't seem to solve the issue, still
> see intermittent hangs. I applied it on top of -rc4:
> 
> copy /mnt/nfs_test/run_nfs_test.junk to /dev/shm/run_nfs_test.tmp...
> 
> real    2m6.072s
> user    0m0.002s
> sys     0m0.263s
> Done
> 
> While it was hung I checked the mem usage on the machine:
> 
> # free -h
>              total        used        free      shared  buff/cache   available
> Mem:           62Gi       871Mi        61Gi       342Mi       889Mi        61Gi
> Swap:         4.0Gi          0B       4.0Gi
> 
> Doesn't appear to be under memory pressure.

Hi, since you know now that it is the client, perhaps a bisect
would be more successful?


--
Chuck Lever




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NFS regression between 5.17 and 5.18
  2022-04-28 13:05 NFS regression between 5.17 and 5.18 Dennis Dalessandro
  2022-04-28 14:57 ` Chuck Lever III
@ 2022-05-04  9:45 ` Thorsten Leemhuis
  1 sibling, 0 replies; 25+ messages in thread
From: Thorsten Leemhuis @ 2022-05-04  9:45 UTC (permalink / raw)
  To: Dennis Dalessandro; +Cc: Chuck Lever, regressions, linux-nfs

[TLDR: I'm adding the regression report below to regzbot, the Linux
kernel regression tracking bot; all text you find below is compiled from
a few templates paragraphs you might have encountered already already
from similar mails.]

Hi, this is your Linux kernel regression tracker. Top-posting for once,
to make this easily accessible to everyone.

To be sure below issue doesn't fall through the cracks unnoticed, I'm
adding it to regzbot, my Linux kernel regression tracking bot:

#regzbot ^introduced v5.17..v5.18
#regzbot title nfs: copy from NFS to the local disk suddently takes 3
min instead of half a second
#regzbot ignore-activity

If it turns out this isn't a regression, free free to remove it from the
tracking by sending a reply to this thread containing a paragraph like
"#regzbot invalid: reason why this is invalid" (without the quotes).

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

On 28.04.22 15:05, Dennis Dalessandro wrote:
> Hi NFS folks,
> 
> I've noticed a pretty nasty regression in our NFS capability between 5.17 and
> 5.18-rc1. I've tried to bisect but not having any luck. The problem I'm seeing
> is it takes 3 minutes to copy a file from NFS to the local disk. When it should
> take less than half a second, which it did up through 5.17.
> 
> It doesn't seem to be network related, but can't rule that out completely.
> 
> I tried to bisect but the problem can be intermittent. Some runs I'll see a
> problem in 3 out of 100 cycles, sometimes 0 out of 100. Sometimes I'll see it
> 100 out of 100.
> 
> The latest attempt I have is at 5.18-rc4 and here's what I see when I 1) create
> file with dd, 2) copy local disk to NFS mount, 3) copy NFS to local disk.
> 
> Test 2
> Creating /dev/shm/run_nfs_test.junk...
> 262144+0 records in
> 262144+0 records out
> 268435456 bytes (268 MB, 256 MiB) copied, 1.29037 s, 208 MB/s
> Done
> copy /dev/shm/run_nfs_test.junk to /mnt/nfs_test/run_nfs_test.junk...
> 
> real    0m0.502s
> user    0m0.001s
> sys     0m0.250s
> Done
> copy /mnt/nfs_test/run_nfs_test.junk to /dev/shm/run_nfs_test.tmp...
> 
> real    4m11.835s
> user    0m0.001s
> sys     0m0.277s
> Done
> Comparing files...
> Done
> Remove /dev/shm/run_nfs_test.tmp...
> 
> real    0m0.031s
> user    0m0.000s
> sys     0m0.031s
> Done
> Remove /dev/shm/run_nfs_test.junk...
> 
> real    0m0.031s
> user    0m0.001s
> sys     0m0.030s
> Done
> Remove /mnt/nfs_test/run_nfs_test.junk...
> 
> real    0m0.024s
> user    0m0.001s
> sys     0m0.022s
> Done
> 
> Create the server with a 4G RAM disk:
> mount -t tmpfs -o size=4096M tmpfs /mnt/nfs_test
> 
> exportfs -o fsid=0,rw,async,insecure,no_root_squash
> 192.168.254.0/255.255.255.0:/mnt/nfs_test
> 
> Mount on the client side with:
> mount -o rdma,port=20049 192.168.254.1:/mnt/nfs_test /mnt/nfs_test
> 
> Any advice on tracking this down further would be greatly appreciated.
> 
> Thanks
> 
> -Denny
> 

-- 
Additional information about regzbot:

If you want to know more about regzbot, check out its web-interface, the
getting start guide, and the references documentation:

https://linux-regtracking.leemhuis.info/regzbot/
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md

The last two documents will explain how you can interact with regzbot
yourself if your want to.

Hint for reporters: when reporting a regression it's in your interest to
CC the regression list and tell regzbot about the issue, as that ensures
the regression makes it onto the radar of the Linux kernel's regression
tracker -- that's in your interest, as it ensures your report won't fall
through the cracks unnoticed.

Hint for developers: you normally don't need to care about regzbot once
it's involved. Fix the issue as you normally would, just remember to
include 'Link:' tag in the patch descriptions pointing to all reports
about the issue. This has been expected from developers even before
regzbot showed up for reasons explained in
'Documentation/process/submitting-patches.rst' and
'Documentation/process/5.Posting.rst'.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NFS regression between 5.17 and 5.18
  2022-04-29 13:37           ` Chuck Lever III
@ 2022-05-06 13:24             ` Dennis Dalessandro
  2022-05-13 11:58               ` Dennis Dalessandro
  0 siblings, 1 reply; 25+ messages in thread
From: Dennis Dalessandro @ 2022-05-06 13:24 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Trond Myklebust, Linux NFS Mailing List

On 4/29/22 9:37 AM, Chuck Lever III wrote:
> 
> 
>> On Apr 29, 2022, at 8:54 AM, Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> wrote:
>>
>> On 4/28/22 3:56 PM, Trond Myklebust wrote:
>>> On Thu, 2022-04-28 at 15:47 -0400, Dennis Dalessandro wrote:
>>>> On 4/28/22 11:42 AM, Dennis Dalessandro wrote:
>>>>> On 4/28/22 10:57 AM, Chuck Lever III wrote:
>>>>>>> On Apr 28, 2022, at 9:05 AM, Dennis Dalessandro
>>>>>>> <dennis.dalessandro@cornelisnetworks.com> wrote:
>>>>>>>
>>>>>>> Hi NFS folks,
>>>>>>>
>>>>>>> I've noticed a pretty nasty regression in our NFS capability
>>>>>>> between 5.17 and
>>>>>>> 5.18-rc1. I've tried to bisect but not having any luck. The
>>>>>>> problem I'm seeing
>>>>>>> is it takes 3 minutes to copy a file from NFS to the local
>>>>>>> disk. When it should
>>>>>>> take less than half a second, which it did up through 5.17.
>>>>>>>
>>>>>>> It doesn't seem to be network related, but can't rule that out
>>>>>>> completely.
>>>>>>>
>>>>>>> I tried to bisect but the problem can be intermittent. Some
>>>>>>> runs I'll see a
>>>>>>> problem in 3 out of 100 cycles, sometimes 0 out of 100.
>>>>>>> Sometimes I'll see it
>>>>>>> 100 out of 100.
>>>>>>
>>>>>> It's not clear from your problem report whether the problem
>>>>>> appears
>>>>>> when it's the server running v5.18-rc or the client.
>>>>>
>>>>> That's because I don't know which it is. I'll do a quick test and
>>>>> find out. I
>>>>> was testing the same kernel across both nodes.
>>>>
>>>> Looks like it is the client.
>>>>
>>>> server  client  result
>>>> ------  ------  ------
>>>> 5.17    5.17    Pass
>>>> 5.17    5.18    Fail
>>>> 5.18    5.18    Fail
>>>> 5.18    5.17    Pass
>>>>
>>>> Is there a patch for the client issue you mentioned that I could try?
>>>>
>>>> -Denny
>>>
>>> Try this one
>>
>> Thanks for the patch. Unfortunately it doesn't seem to solve the issue, still
>> see intermittent hangs. I applied it on top of -rc4:
>>
>> copy /mnt/nfs_test/run_nfs_test.junk to /dev/shm/run_nfs_test.tmp...
>>
>> real    2m6.072s
>> user    0m0.002s
>> sys     0m0.263s
>> Done
>>
>> While it was hung I checked the mem usage on the machine:
>>
>> # free -h
>>              total        used        free      shared  buff/cache   available
>> Mem:           62Gi       871Mi        61Gi       342Mi       889Mi        61Gi
>> Swap:         4.0Gi          0B       4.0Gi
>>
>> Doesn't appear to be under memory pressure.
> 
> Hi, since you know now that it is the client, perhaps a bisect
> would be more successful?

I've been testing all week. I pulled the nfs-rdma tree that was sent to Linus
for 5.18 and tested. I see the problem on pretty much all the patches. However
it's the frequency that it hits which changes.

I'll see 1-5 cycles out of 2500 where the copy takes minutes up to:
"NFS: Convert readdir page cache to use a cookie based index"

After this I start seeing it around 10 times in 500 and by the last patch 10
times in less than 100.

Is there any kind of tracing/debugging I could turn on to get more insight on
what is taking so long when it does go bad?

-Denny





^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NFS regression between 5.17 and 5.18
  2022-05-06 13:24             ` Dennis Dalessandro
@ 2022-05-13 11:58               ` Dennis Dalessandro
  2022-05-13 13:30                 ` Trond Myklebust
  0 siblings, 1 reply; 25+ messages in thread
From: Dennis Dalessandro @ 2022-05-13 11:58 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Trond Myklebust, Linux NFS Mailing List

On 5/6/22 9:24 AM, Dennis Dalessandro wrote:
> On 4/29/22 9:37 AM, Chuck Lever III wrote:
>>
>>
>>> On Apr 29, 2022, at 8:54 AM, Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> wrote:
>>>
>>> On 4/28/22 3:56 PM, Trond Myklebust wrote:
>>>> On Thu, 2022-04-28 at 15:47 -0400, Dennis Dalessandro wrote:
>>>>> On 4/28/22 11:42 AM, Dennis Dalessandro wrote:
>>>>>> On 4/28/22 10:57 AM, Chuck Lever III wrote:
>>>>>>>> On Apr 28, 2022, at 9:05 AM, Dennis Dalessandro
>>>>>>>> <dennis.dalessandro@cornelisnetworks.com> wrote:
>>>>>>>>
>>>>>>>> Hi NFS folks,
>>>>>>>>
>>>>>>>> I've noticed a pretty nasty regression in our NFS capability
>>>>>>>> between 5.17 and
>>>>>>>> 5.18-rc1. I've tried to bisect but not having any luck. The
>>>>>>>> problem I'm seeing
>>>>>>>> is it takes 3 minutes to copy a file from NFS to the local
>>>>>>>> disk. When it should
>>>>>>>> take less than half a second, which it did up through 5.17.
>>>>>>>>
>>>>>>>> It doesn't seem to be network related, but can't rule that out
>>>>>>>> completely.
>>>>>>>>
>>>>>>>> I tried to bisect but the problem can be intermittent. Some
>>>>>>>> runs I'll see a
>>>>>>>> problem in 3 out of 100 cycles, sometimes 0 out of 100.
>>>>>>>> Sometimes I'll see it
>>>>>>>> 100 out of 100.
>>>>>>>
>>>>>>> It's not clear from your problem report whether the problem
>>>>>>> appears
>>>>>>> when it's the server running v5.18-rc or the client.
>>>>>>
>>>>>> That's because I don't know which it is. I'll do a quick test and
>>>>>> find out. I
>>>>>> was testing the same kernel across both nodes.
>>>>>
>>>>> Looks like it is the client.
>>>>>
>>>>> server  client  result
>>>>> ------  ------  ------
>>>>> 5.17    5.17    Pass
>>>>> 5.17    5.18    Fail
>>>>> 5.18    5.18    Fail
>>>>> 5.18    5.17    Pass
>>>>>
>>>>> Is there a patch for the client issue you mentioned that I could try?
>>>>>
>>>>> -Denny
>>>>
>>>> Try this one
>>>
>>> Thanks for the patch. Unfortunately it doesn't seem to solve the issue, still
>>> see intermittent hangs. I applied it on top of -rc4:
>>>
>>> copy /mnt/nfs_test/run_nfs_test.junk to /dev/shm/run_nfs_test.tmp...
>>>
>>> real    2m6.072s
>>> user    0m0.002s
>>> sys     0m0.263s
>>> Done
>>>
>>> While it was hung I checked the mem usage on the machine:
>>>
>>> # free -h
>>>              total        used        free      shared  buff/cache   available
>>> Mem:           62Gi       871Mi        61Gi       342Mi       889Mi        61Gi
>>> Swap:         4.0Gi          0B       4.0Gi
>>>
>>> Doesn't appear to be under memory pressure.
>>
>> Hi, since you know now that it is the client, perhaps a bisect
>> would be more successful?
> 
> I've been testing all week. I pulled the nfs-rdma tree that was sent to Linus
> for 5.18 and tested. I see the problem on pretty much all the patches. However
> it's the frequency that it hits which changes.
> 
> I'll see 1-5 cycles out of 2500 where the copy takes minutes up to:
> "NFS: Convert readdir page cache to use a cookie based index"
> 
> After this I start seeing it around 10 times in 500 and by the last patch 10
> times in less than 100.
> 
> Is there any kind of tracing/debugging I could turn on to get more insight on
> what is taking so long when it does go bad?
> 

Ran a test with -rc6 and this time see a hung task trace on the console as well
as an NFS RPC error.

[32719.991175] nfs: RPC call returned error 512
.
.
.
[32933.285126] INFO: task kworker/u145:23:886141 blocked for more than 122 seconds.
[32933.293543]       Tainted: G S                5.18.0-rc6 #1
[32933.299869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[32933.308740] task:kworker/u145:23 state:D stack:    0 pid:886141 ppid:     2
flags:0x00004000
[32933.318321] Workqueue: rpciod rpc_async_schedule [sunrpc]
[32933.324524] Call Trace:
[32933.327347]  <TASK>
[32933.329785]  __schedule+0x3dd/0x970
[32933.333783]  schedule+0x41/0xa0
[32933.337388]  xprt_request_dequeue_xprt+0xd1/0x140 [sunrpc]
[32933.343639]  ? prepare_to_wait+0xd0/0xd0
[32933.348123]  ? rpc_destroy_wait_queue+0x10/0x10 [sunrpc]
[32933.354183]  xprt_release+0x26/0x140 [sunrpc]
[32933.359168]  ? rpc_destroy_wait_queue+0x10/0x10 [sunrpc]
[32933.365225]  rpc_release_resources_task+0xe/0x50 [sunrpc]
[32933.371381]  __rpc_execute+0x2c5/0x4e0 [sunrpc]
[32933.376564]  ? __switch_to_asm+0x42/0x70
[32933.381046]  ? finish_task_switch+0xb2/0x2c0
[32933.385918]  rpc_async_schedule+0x29/0x40 [sunrpc]
[32933.391391]  process_one_work+0x1c8/0x390
[32933.395975]  worker_thread+0x30/0x360
[32933.400162]  ? process_one_work+0x390/0x390
[32933.404931]  kthread+0xd9/0x100
[32933.408536]  ? kthread_complete_and_exit+0x20/0x20
[32933.413984]  ret_from_fork+0x22/0x30
[32933.418074]  </TASK>

The call trace shows up again at 245, 368, and 491 seconds. Same task, same trace.





^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NFS regression between 5.17 and 5.18
  2022-05-13 11:58               ` Dennis Dalessandro
@ 2022-05-13 13:30                 ` Trond Myklebust
  2022-05-13 14:59                   ` Chuck Lever III
  0 siblings, 1 reply; 25+ messages in thread
From: Trond Myklebust @ 2022-05-13 13:30 UTC (permalink / raw)
  To: aglo, dennis.dalessandro, chuck.lever; +Cc: linux-nfs

On Fri, 2022-05-13 at 07:58 -0400, Dennis Dalessandro wrote:
> On 5/6/22 9:24 AM, Dennis Dalessandro wrote:
> > On 4/29/22 9:37 AM, Chuck Lever III wrote:
> > > 
> > > 
> > > > On Apr 29, 2022, at 8:54 AM, Dennis Dalessandro
> > > > <dennis.dalessandro@cornelisnetworks.com> wrote:
> > > > 
> > > > On 4/28/22 3:56 PM, Trond Myklebust wrote:
> > > > > On Thu, 2022-04-28 at 15:47 -0400, Dennis Dalessandro wrote:
> > > > > > On 4/28/22 11:42 AM, Dennis Dalessandro wrote:
> > > > > > > On 4/28/22 10:57 AM, Chuck Lever III wrote:
> > > > > > > > > On Apr 28, 2022, at 9:05 AM, Dennis Dalessandro
> > > > > > > > > <dennis.dalessandro@cornelisnetworks.com> wrote:
> > > > > > > > > 
> > > > > > > > > Hi NFS folks,
> > > > > > > > > 
> > > > > > > > > I've noticed a pretty nasty regression in our NFS
> > > > > > > > > capability
> > > > > > > > > between 5.17 and
> > > > > > > > > 5.18-rc1. I've tried to bisect but not having any
> > > > > > > > > luck. The
> > > > > > > > > problem I'm seeing
> > > > > > > > > is it takes 3 minutes to copy a file from NFS to the
> > > > > > > > > local
> > > > > > > > > disk. When it should
> > > > > > > > > take less than half a second, which it did up through
> > > > > > > > > 5.17.
> > > > > > > > > 
> > > > > > > > > It doesn't seem to be network related, but can't rule
> > > > > > > > > that out
> > > > > > > > > completely.
> > > > > > > > > 
> > > > > > > > > I tried to bisect but the problem can be
> > > > > > > > > intermittent. Some
> > > > > > > > > runs I'll see a
> > > > > > > > > problem in 3 out of 100 cycles, sometimes 0 out of
> > > > > > > > > 100.
> > > > > > > > > Sometimes I'll see it
> > > > > > > > > 100 out of 100.
> > > > > > > > 
> > > > > > > > It's not clear from your problem report whether the
> > > > > > > > problem
> > > > > > > > appears
> > > > > > > > when it's the server running v5.18-rc or the client.
> > > > > > > 
> > > > > > > That's because I don't know which it is. I'll do a quick
> > > > > > > test and
> > > > > > > find out. I
> > > > > > > was testing the same kernel across both nodes.
> > > > > > 
> > > > > > Looks like it is the client.
> > > > > > 
> > > > > > server  client  result
> > > > > > ------  ------  ------
> > > > > > 5.17    5.17    Pass
> > > > > > 5.17    5.18    Fail
> > > > > > 5.18    5.18    Fail
> > > > > > 5.18    5.17    Pass
> > > > > > 
> > > > > > Is there a patch for the client issue you mentioned that I
> > > > > > could try?
> > > > > > 
> > > > > > -Denny
> > > > > 
> > > > > Try this one
> > > > 
> > > > Thanks for the patch. Unfortunately it doesn't seem to solve
> > > > the issue, still
> > > > see intermittent hangs. I applied it on top of -rc4:
> > > > 
> > > > copy /mnt/nfs_test/run_nfs_test.junk to
> > > > /dev/shm/run_nfs_test.tmp...
> > > > 
> > > > real    2m6.072s
> > > > user    0m0.002s
> > > > sys     0m0.263s
> > > > Done
> > > > 
> > > > While it was hung I checked the mem usage on the machine:
> > > > 
> > > > # free -h
> > > >              total        used        free      shared 
> > > > buff/cache   available
> > > > Mem:           62Gi       871Mi        61Gi       342Mi      
> > > > 889Mi        61Gi
> > > > Swap:         4.0Gi          0B       4.0Gi
> > > > 
> > > > Doesn't appear to be under memory pressure.
> > > 
> > > Hi, since you know now that it is the client, perhaps a bisect
> > > would be more successful?
> > 
> > I've been testing all week. I pulled the nfs-rdma tree that was
> > sent to Linus
> > for 5.18 and tested. I see the problem on pretty much all the
> > patches. However
> > it's the frequency that it hits which changes.
> > 
> > I'll see 1-5 cycles out of 2500 where the copy takes minutes up to:
> > "NFS: Convert readdir page cache to use a cookie based index"
> > 
> > After this I start seeing it around 10 times in 500 and by the last
> > patch 10
> > times in less than 100.
> > 
> > Is there any kind of tracing/debugging I could turn on to get more
> > insight on
> > what is taking so long when it does go bad?
> > 
> 
> Ran a test with -rc6 and this time see a hung task trace on the
> console as well
> as an NFS RPC error.
> 
> [32719.991175] nfs: RPC call returned error 512
> .
> .
> .
> [32933.285126] INFO: task kworker/u145:23:886141 blocked for more
> than 122 seconds.
> [32933.293543]       Tainted: G S                5.18.0-rc6 #1
> [32933.299869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this
> message.
> [32933.308740] task:kworker/u145:23 state:D stack:    0 pid:886141
> ppid:     2
> flags:0x00004000
> [32933.318321] Workqueue: rpciod rpc_async_schedule [sunrpc]
> [32933.324524] Call Trace:
> [32933.327347]  <TASK>
> [32933.329785]  __schedule+0x3dd/0x970
> [32933.333783]  schedule+0x41/0xa0
> [32933.337388]  xprt_request_dequeue_xprt+0xd1/0x140 [sunrpc]
> [32933.343639]  ? prepare_to_wait+0xd0/0xd0
> [32933.348123]  ? rpc_destroy_wait_queue+0x10/0x10 [sunrpc]
> [32933.354183]  xprt_release+0x26/0x140 [sunrpc]
> [32933.359168]  ? rpc_destroy_wait_queue+0x10/0x10 [sunrpc]
> [32933.365225]  rpc_release_resources_task+0xe/0x50 [sunrpc]
> [32933.371381]  __rpc_execute+0x2c5/0x4e0 [sunrpc]
> [32933.376564]  ? __switch_to_asm+0x42/0x70
> [32933.381046]  ? finish_task_switch+0xb2/0x2c0
> [32933.385918]  rpc_async_schedule+0x29/0x40 [sunrpc]
> [32933.391391]  process_one_work+0x1c8/0x390
> [32933.395975]  worker_thread+0x30/0x360
> [32933.400162]  ? process_one_work+0x390/0x390
> [32933.404931]  kthread+0xd9/0x100
> [32933.408536]  ? kthread_complete_and_exit+0x20/0x20
> [32933.413984]  ret_from_fork+0x22/0x30
> [32933.418074]  </TASK>
> 
> The call trace shows up again at 245, 368, and 491 seconds. Same
> task, same trace.
> 
> 
> 
> 

That's very helpful. The above trace suggests that the RDMA code is
leaking a call to xprt_unpin_rqst().


From what I can see, rpcrdma_reply_handler() will always call
xprt_pin_rqst(), and that call is required to be matched by a call to
xprt_unpin_rqst() before the handler exits. However those calls to
xprt_unpin_rqst() appear to be scattered around the code, and it is not
at all obvious that the requirement is being fulfilled.

Chuck, why does that code need to be so complex?

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NFS regression between 5.17 and 5.18
  2022-05-13 13:30                 ` Trond Myklebust
@ 2022-05-13 14:59                   ` Chuck Lever III
  2022-05-13 15:19                     ` Trond Myklebust
  2022-05-17 13:40                     ` Dennis Dalessandro
  0 siblings, 2 replies; 25+ messages in thread
From: Chuck Lever III @ 2022-05-13 14:59 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Olga Kornievskaia, dennis.dalessandro, Linux NFS Mailing List



> On May 13, 2022, at 9:30 AM, Trond Myklebust <trondmy@hammerspace.com> wrote:
> 
> On Fri, 2022-05-13 at 07:58 -0400, Dennis Dalessandro wrote:
>> On 5/6/22 9:24 AM, Dennis Dalessandro wrote:
>>> On 4/29/22 9:37 AM, Chuck Lever III wrote:
>>>> 
>>>> 
>>>>> On Apr 29, 2022, at 8:54 AM, Dennis Dalessandro
>>>>> <dennis.dalessandro@cornelisnetworks.com> wrote:
>>>>> 
>>>>> On 4/28/22 3:56 PM, Trond Myklebust wrote:
>>>>>> On Thu, 2022-04-28 at 15:47 -0400, Dennis Dalessandro wrote:
>>>>>>> On 4/28/22 11:42 AM, Dennis Dalessandro wrote:
>>>>>>>> On 4/28/22 10:57 AM, Chuck Lever III wrote:
>>>>>>>>>> On Apr 28, 2022, at 9:05 AM, Dennis Dalessandro
>>>>>>>>>> <dennis.dalessandro@cornelisnetworks.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi NFS folks,
>>>>>>>>>> 
>>>>>>>>>> I've noticed a pretty nasty regression in our NFS
>>>>>>>>>> capability
>>>>>>>>>> between 5.17 and
>>>>>>>>>> 5.18-rc1. I've tried to bisect but not having any
>>>>>>>>>> luck. The
>>>>>>>>>> problem I'm seeing
>>>>>>>>>> is it takes 3 minutes to copy a file from NFS to the
>>>>>>>>>> local
>>>>>>>>>> disk. When it should
>>>>>>>>>> take less than half a second, which it did up through
>>>>>>>>>> 5.17.
>>>>>>>>>> 
>>>>>>>>>> It doesn't seem to be network related, but can't rule
>>>>>>>>>> that out
>>>>>>>>>> completely.
>>>>>>>>>> 
>>>>>>>>>> I tried to bisect but the problem can be
>>>>>>>>>> intermittent. Some
>>>>>>>>>> runs I'll see a
>>>>>>>>>> problem in 3 out of 100 cycles, sometimes 0 out of
>>>>>>>>>> 100.
>>>>>>>>>> Sometimes I'll see it
>>>>>>>>>> 100 out of 100.
>>>>>>>>> 
>>>>>>>>> It's not clear from your problem report whether the
>>>>>>>>> problem
>>>>>>>>> appears
>>>>>>>>> when it's the server running v5.18-rc or the client.
>>>>>>>> 
>>>>>>>> That's because I don't know which it is. I'll do a quick
>>>>>>>> test and
>>>>>>>> find out. I
>>>>>>>> was testing the same kernel across both nodes.
>>>>>>> 
>>>>>>> Looks like it is the client.
>>>>>>> 
>>>>>>> server  client  result
>>>>>>> ------  ------  ------
>>>>>>> 5.17    5.17    Pass
>>>>>>> 5.17    5.18    Fail
>>>>>>> 5.18    5.18    Fail
>>>>>>> 5.18    5.17    Pass
>>>>>>> 
>>>>>>> Is there a patch for the client issue you mentioned that I
>>>>>>> could try?
>>>>>>> 
>>>>>>> -Denny
>>>>>> 
>>>>>> Try this one
>>>>> 
>>>>> Thanks for the patch. Unfortunately it doesn't seem to solve
>>>>> the issue, still
>>>>> see intermittent hangs. I applied it on top of -rc4:
>>>>> 
>>>>> copy /mnt/nfs_test/run_nfs_test.junk to
>>>>> /dev/shm/run_nfs_test.tmp...
>>>>> 
>>>>> real    2m6.072s
>>>>> user    0m0.002s
>>>>> sys     0m0.263s
>>>>> Done
>>>>> 
>>>>> While it was hung I checked the mem usage on the machine:
>>>>> 
>>>>> # free -h
>>>>>              total        used        free      shared 
>>>>> buff/cache   available
>>>>> Mem:           62Gi       871Mi        61Gi       342Mi      
>>>>> 889Mi        61Gi
>>>>> Swap:         4.0Gi          0B       4.0Gi
>>>>> 
>>>>> Doesn't appear to be under memory pressure.
>>>> 
>>>> Hi, since you know now that it is the client, perhaps a bisect
>>>> would be more successful?
>>> 
>>> I've been testing all week. I pulled the nfs-rdma tree that was
>>> sent to Linus
>>> for 5.18 and tested. I see the problem on pretty much all the
>>> patches. However
>>> it's the frequency that it hits which changes.
>>> 
>>> I'll see 1-5 cycles out of 2500 where the copy takes minutes up to:
>>> "NFS: Convert readdir page cache to use a cookie based index"
>>> 
>>> After this I start seeing it around 10 times in 500 and by the last
>>> patch 10
>>> times in less than 100.
>>> 
>>> Is there any kind of tracing/debugging I could turn on to get more
>>> insight on
>>> what is taking so long when it does go bad?
>>> 
>> 
>> Ran a test with -rc6 and this time see a hung task trace on the
>> console as well
>> as an NFS RPC error.
>> 
>> [32719.991175] nfs: RPC call returned error 512
>> .
>> .
>> .
>> [32933.285126] INFO: task kworker/u145:23:886141 blocked for more
>> than 122 seconds.
>> [32933.293543]       Tainted: G S                5.18.0-rc6 #1
>> [32933.299869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this
>> message.
>> [32933.308740] task:kworker/u145:23 state:D stack:    0 pid:886141
>> ppid:     2
>> flags:0x00004000
>> [32933.318321] Workqueue: rpciod rpc_async_schedule [sunrpc]
>> [32933.324524] Call Trace:
>> [32933.327347]  <TASK>
>> [32933.329785]  __schedule+0x3dd/0x970
>> [32933.333783]  schedule+0x41/0xa0
>> [32933.337388]  xprt_request_dequeue_xprt+0xd1/0x140 [sunrpc]
>> [32933.343639]  ? prepare_to_wait+0xd0/0xd0
>> [32933.348123]  ? rpc_destroy_wait_queue+0x10/0x10 [sunrpc]
>> [32933.354183]  xprt_release+0x26/0x140 [sunrpc]
>> [32933.359168]  ? rpc_destroy_wait_queue+0x10/0x10 [sunrpc]
>> [32933.365225]  rpc_release_resources_task+0xe/0x50 [sunrpc]
>> [32933.371381]  __rpc_execute+0x2c5/0x4e0 [sunrpc]
>> [32933.376564]  ? __switch_to_asm+0x42/0x70
>> [32933.381046]  ? finish_task_switch+0xb2/0x2c0
>> [32933.385918]  rpc_async_schedule+0x29/0x40 [sunrpc]
>> [32933.391391]  process_one_work+0x1c8/0x390
>> [32933.395975]  worker_thread+0x30/0x360
>> [32933.400162]  ? process_one_work+0x390/0x390
>> [32933.404931]  kthread+0xd9/0x100
>> [32933.408536]  ? kthread_complete_and_exit+0x20/0x20
>> [32933.413984]  ret_from_fork+0x22/0x30
>> [32933.418074]  </TASK>
>> 
>> The call trace shows up again at 245, 368, and 491 seconds. Same
>> task, same trace.
>> 
>> 
>> 
>> 
> 
> That's very helpful. The above trace suggests that the RDMA code is
> leaking a call to xprt_unpin_rqst().

IMHO this is unlikely to be related to the performance
regression -- none of this code has changed in the past 5
kernel releases. Could be a different issue, though.

As is often the case in these situations, the INFO trace
above happens long after the issue that caused the missing
unpin. So... unless Dennis has a reproducer that can trigger
the issue frequently, I don't think there's much that can
be extracted from that.

Also "nfs: RPC call returned error 512" suggests someone
hit ^C at some point. It's always possible that the
xprt_rdma_free() path is missing an unpin. But again,
that's not likely to be related to performance.


> From what I can see, rpcrdma_reply_handler() will always call
> xprt_pin_rqst(), and that call is required to be matched by a call to
> xprt_unpin_rqst() before the handler exits. However those calls to
> xprt_unpin_rqst() appear to be scattered around the code, and it is not
> at all obvious that the requirement is being fulfilled.

I don't think unpin is required before rpcrdma_reply_handler()
exits. It /is/ required after xprt_complete_rqst(), and
sometimes that call has to be deferred until the NIC is
fully done with the RPC's buffers. The NIC is often not
finished with the buffers in the Receive completion handler.


> Chuck, why does that code need to be so complex?

The overall goal is that the upper layer cannot be awoken
until the NIC has finished with the RPC's header and
payload buffers. These buffers are treated differently
depending on how the send path chose to marshal the RPC.

And, all of these are hot paths, so it's important to
ensure that we are waiting/context-switching as
infrequently as possible. We wake the upper layer waiter
during an RDMA completion to reduce context switching.
The first completion does not always mean there's nothing
left for the NIC to do.

Ignoring the server backchannel, there is a single unpin
in the "complete without error" case in
rpcrdma_complete_rqst().

There are three call sites for rpcrdma_complete_rqst:

1. rpcrdma_reply_done - called when no explicit invalidation
is needed and the Receive completion fired after the Send
completion.

2. rpcrdma_sendctx_done - called when no explicit invalidation
is needed and the Send completion fired after the Receive
completion.

3. frwr_wc_localinv_done - called when explicit invalidation
was needed and that invalidation has completed. SQ ordering
guarantees that the Send has also completed at this point,
so a separate wait for Send completion is not needed.

These are the points when the NIC has finished with all of
the memory resources associated with an RPC, so that's
when it is safe to wake the waiter and unpin the RPC.

--
Chuck Lever




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NFS regression between 5.17 and 5.18
  2022-05-13 14:59                   ` Chuck Lever III
@ 2022-05-13 15:19                     ` Trond Myklebust
  2022-05-13 18:53                       ` Chuck Lever III
  2022-05-17 13:40                     ` Dennis Dalessandro
  1 sibling, 1 reply; 25+ messages in thread
From: Trond Myklebust @ 2022-05-13 15:19 UTC (permalink / raw)
  To: chuck.lever; +Cc: linux-nfs, aglo, dennis.dalessandro

On Fri, 2022-05-13 at 14:59 +0000, Chuck Lever III wrote:
> 
> 
> > On May 13, 2022, at 9:30 AM, Trond Myklebust
> > <trondmy@hammerspace.com> wrote:
> > 
> > On Fri, 2022-05-13 at 07:58 -0400, Dennis Dalessandro wrote:
> > > On 5/6/22 9:24 AM, Dennis Dalessandro wrote:
> > > > On 4/29/22 9:37 AM, Chuck Lever III wrote:
> > > > > 
> > > > > 
> > > > > > On Apr 29, 2022, at 8:54 AM, Dennis Dalessandro
> > > > > > <dennis.dalessandro@cornelisnetworks.com> wrote:
> > > > > > 
> > > > > > On 4/28/22 3:56 PM, Trond Myklebust wrote:
> > > > > > > On Thu, 2022-04-28 at 15:47 -0400, Dennis Dalessandro
> > > > > > > wrote:
> > > > > > > > On 4/28/22 11:42 AM, Dennis Dalessandro wrote:
> > > > > > > > > On 4/28/22 10:57 AM, Chuck Lever III wrote:
> > > > > > > > > > > On Apr 28, 2022, at 9:05 AM, Dennis Dalessandro
> > > > > > > > > > > <dennis.dalessandro@cornelisnetworks.com> wrote:
> > > > > > > > > > > 
> > > > > > > > > > > Hi NFS folks,
> > > > > > > > > > > 
> > > > > > > > > > > I've noticed a pretty nasty regression in our NFS
> > > > > > > > > > > capability
> > > > > > > > > > > between 5.17 and
> > > > > > > > > > > 5.18-rc1. I've tried to bisect but not having any
> > > > > > > > > > > luck. The
> > > > > > > > > > > problem I'm seeing
> > > > > > > > > > > is it takes 3 minutes to copy a file from NFS to
> > > > > > > > > > > the
> > > > > > > > > > > local
> > > > > > > > > > > disk. When it should
> > > > > > > > > > > take less than half a second, which it did up
> > > > > > > > > > > through
> > > > > > > > > > > 5.17.
> > > > > > > > > > > 
> > > > > > > > > > > It doesn't seem to be network related, but can't
> > > > > > > > > > > rule
> > > > > > > > > > > that out
> > > > > > > > > > > completely.
> > > > > > > > > > > 
> > > > > > > > > > > I tried to bisect but the problem can be
> > > > > > > > > > > intermittent. Some
> > > > > > > > > > > runs I'll see a
> > > > > > > > > > > problem in 3 out of 100 cycles, sometimes 0 out
> > > > > > > > > > > of
> > > > > > > > > > > 100.
> > > > > > > > > > > Sometimes I'll see it
> > > > > > > > > > > 100 out of 100.
> > > > > > > > > > 
> > > > > > > > > > It's not clear from your problem report whether the
> > > > > > > > > > problem
> > > > > > > > > > appears
> > > > > > > > > > when it's the server running v5.18-rc or the
> > > > > > > > > > client.
> > > > > > > > > 
> > > > > > > > > That's because I don't know which it is. I'll do a
> > > > > > > > > quick
> > > > > > > > > test and
> > > > > > > > > find out. I
> > > > > > > > > was testing the same kernel across both nodes.
> > > > > > > > 
> > > > > > > > Looks like it is the client.
> > > > > > > > 
> > > > > > > > server  client  result
> > > > > > > > ------  ------  ------
> > > > > > > > 5.17    5.17    Pass
> > > > > > > > 5.17    5.18    Fail
> > > > > > > > 5.18    5.18    Fail
> > > > > > > > 5.18    5.17    Pass
> > > > > > > > 
> > > > > > > > Is there a patch for the client issue you mentioned
> > > > > > > > that I
> > > > > > > > could try?
> > > > > > > > 
> > > > > > > > -Denny
> > > > > > > 
> > > > > > > Try this one
> > > > > > 
> > > > > > Thanks for the patch. Unfortunately it doesn't seem to
> > > > > > solve
> > > > > > the issue, still
> > > > > > see intermittent hangs. I applied it on top of -rc4:
> > > > > > 
> > > > > > copy /mnt/nfs_test/run_nfs_test.junk to
> > > > > > /dev/shm/run_nfs_test.tmp...
> > > > > > 
> > > > > > real    2m6.072s
> > > > > > user    0m0.002s
> > > > > > sys     0m0.263s
> > > > > > Done
> > > > > > 
> > > > > > While it was hung I checked the mem usage on the machine:
> > > > > > 
> > > > > > # free -h
> > > > > >              total        used        free      shared 
> > > > > > buff/cache   available
> > > > > > Mem:           62Gi       871Mi        61Gi      
> > > > > > 342Mi      
> > > > > > 889Mi        61Gi
> > > > > > Swap:         4.0Gi          0B       4.0Gi
> > > > > > 
> > > > > > Doesn't appear to be under memory pressure.
> > > > > 
> > > > > Hi, since you know now that it is the client, perhaps a
> > > > > bisect
> > > > > would be more successful?
> > > > 
> > > > I've been testing all week. I pulled the nfs-rdma tree that was
> > > > sent to Linus
> > > > for 5.18 and tested. I see the problem on pretty much all the
> > > > patches. However
> > > > it's the frequency that it hits which changes.
> > > > 
> > > > I'll see 1-5 cycles out of 2500 where the copy takes minutes up
> > > > to:
> > > > "NFS: Convert readdir page cache to use a cookie based index"
> > > > 
> > > > After this I start seeing it around 10 times in 500 and by the
> > > > last
> > > > patch 10
> > > > times in less than 100.
> > > > 
> > > > Is there any kind of tracing/debugging I could turn on to get
> > > > more
> > > > insight on
> > > > what is taking so long when it does go bad?
> > > > 
> > > 
> > > Ran a test with -rc6 and this time see a hung task trace on the
> > > console as well
> > > as an NFS RPC error.
> > > 
> > > [32719.991175] nfs: RPC call returned error 512
> > > .
> > > .
> > > .
> > > [32933.285126] INFO: task kworker/u145:23:886141 blocked for more
> > > than 122 seconds.
> > > [32933.293543]       Tainted: G S                5.18.0-rc6 #1
> > > [32933.299869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > disables this
> > > message.
> > > [32933.308740] task:kworker/u145:23 state:D stack:    0
> > > pid:886141
> > > ppid:     2
> > > flags:0x00004000
> > > [32933.318321] Workqueue: rpciod rpc_async_schedule [sunrpc]
> > > [32933.324524] Call Trace:
> > > [32933.327347]  <TASK>
> > > [32933.329785]  __schedule+0x3dd/0x970
> > > [32933.333783]  schedule+0x41/0xa0
> > > [32933.337388]  xprt_request_dequeue_xprt+0xd1/0x140 [sunrpc]
> > > [32933.343639]  ? prepare_to_wait+0xd0/0xd0
> > > [32933.348123]  ? rpc_destroy_wait_queue+0x10/0x10 [sunrpc]
> > > [32933.354183]  xprt_release+0x26/0x140 [sunrpc]
> > > [32933.359168]  ? rpc_destroy_wait_queue+0x10/0x10 [sunrpc]
> > > [32933.365225]  rpc_release_resources_task+0xe/0x50 [sunrpc]
> > > [32933.371381]  __rpc_execute+0x2c5/0x4e0 [sunrpc]
> > > [32933.376564]  ? __switch_to_asm+0x42/0x70
> > > [32933.381046]  ? finish_task_switch+0xb2/0x2c0
> > > [32933.385918]  rpc_async_schedule+0x29/0x40 [sunrpc]
> > > [32933.391391]  process_one_work+0x1c8/0x390
> > > [32933.395975]  worker_thread+0x30/0x360
> > > [32933.400162]  ? process_one_work+0x390/0x390
> > > [32933.404931]  kthread+0xd9/0x100
> > > [32933.408536]  ? kthread_complete_and_exit+0x20/0x20
> > > [32933.413984]  ret_from_fork+0x22/0x30
> > > [32933.418074]  </TASK>
> > > 
> > > The call trace shows up again at 245, 368, and 491 seconds. Same
> > > task, same trace.
> > > 
> > > 
> > > 
> > > 
> > 
> > That's very helpful. The above trace suggests that the RDMA code is
> > leaking a call to xprt_unpin_rqst().
> 
> IMHO this is unlikely to be related to the performance
> regression -- none of this code has changed in the past 5
> kernel releases. Could be a different issue, though.
> 
> As is often the case in these situations, the INFO trace
> above happens long after the issue that caused the missing
> unpin. So... unless Dennis has a reproducer that can trigger
> the issue frequently, I don't think there's much that can
> be extracted from that.
> 
> Also "nfs: RPC call returned error 512" suggests someone
> hit ^C at some point. It's always possible that the
> xprt_rdma_free() path is missing an unpin. But again,
> that's not likely to be related to performance.
> 
> 
> > From what I can see, rpcrdma_reply_handler() will always call
> > xprt_pin_rqst(), and that call is required to be matched by a call
> > to
> > xprt_unpin_rqst() before the handler exits. However those calls to
> > xprt_unpin_rqst() appear to be scattered around the code, and it is
> > not
> > at all obvious that the requirement is being fulfilled.
> 
> I don't think unpin is required before rpcrdma_reply_handler()
> exits. It /is/ required after xprt_complete_rqst(), and
> sometimes that call has to be deferred until the NIC is
> fully done with the RPC's buffers. The NIC is often not
> finished with the buffers in the Receive completion handler.
> 
> 
> > Chuck, why does that code need to be so complex?
> 
> The overall goal is that the upper layer cannot be awoken
> until the NIC has finished with the RPC's header and
> payload buffers. These buffers are treated differently
> depending on how the send path chose to marshal the RPC.
> 
> And, all of these are hot paths, so it's important to
> ensure that we are waiting/context-switching as
> infrequently as possible. We wake the upper layer waiter
> during an RDMA completion to reduce context switching.
> The first completion does not always mean there's nothing
> left for the NIC to do.
> 
> Ignoring the server backchannel, there is a single unpin
> in the "complete without error" case in
> rpcrdma_complete_rqst().
> 
> There are three call sites for rpcrdma_complete_rqst:
> 
> 1. rpcrdma_reply_done - called when no explicit invalidation
> is needed and the Receive completion fired after the Send
> completion.
> 
> 2. rpcrdma_sendctx_done - called when no explicit invalidation
> is needed and the Send completion fired after the Receive
> completion.
> 
> 3. frwr_wc_localinv_done - called when explicit invalidation
> was needed and that invalidation has completed. SQ ordering
> guarantees that the Send has also completed at this point,
> so a separate wait for Send completion is not needed.
> 
> These are the points when the NIC has finished with all of
> the memory resources associated with an RPC, so that's
> when it is safe to wake the waiter and unpin the RPC.
> 

From what I can see, what's important in the case of RDMA is to ensure
that we call exactly 1 of either frwr_unmap_async(), frwr_unmap_sync()
or rpcrdma_complete_rqst(), and that whichever operation we select must
completed before we call xprt->ops->free_slot().

xprt_pin_rqst()/xprt_unpin_rqst() can help to ensure exclusion between
those 3 operations, but it is not required if there are other ways to
do so.
In fact since xprt_wait_on_pinned_rqst() forces a synchronous sleep in
xprt_release(), it would be much preferable to find a better solution
than to pin the request if you concern is performance.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NFS regression between 5.17 and 5.18
  2022-05-13 15:19                     ` Trond Myklebust
@ 2022-05-13 18:53                       ` Chuck Lever III
  0 siblings, 0 replies; 25+ messages in thread
From: Chuck Lever III @ 2022-05-13 18:53 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Linux NFS Mailing List, Olga Kornievskaia, dennis.dalessandro



> On May 13, 2022, at 11:19 AM, Trond Myklebust <trondmy@hammerspace.com> wrote:
> 
> On Fri, 2022-05-13 at 14:59 +0000, Chuck Lever III wrote:
>> 
>> 
>>> On May 13, 2022, at 9:30 AM, Trond Myklebust
>>> <trondmy@hammerspace.com> wrote:
>>> 
>>> On Fri, 2022-05-13 at 07:58 -0400, Dennis Dalessandro wrote:
>>>> On 5/6/22 9:24 AM, Dennis Dalessandro wrote:
>>>>> On 4/29/22 9:37 AM, Chuck Lever III wrote:
>>>>>> 
>>>>>> 
>>>>>>> On Apr 29, 2022, at 8:54 AM, Dennis Dalessandro
>>>>>>> <dennis.dalessandro@cornelisnetworks.com> wrote:
>>>>>>> 
>>>>>>> On 4/28/22 3:56 PM, Trond Myklebust wrote:
>>>>>>>> On Thu, 2022-04-28 at 15:47 -0400, Dennis Dalessandro
>>>>>>>> wrote:
>>>>>>>>> On 4/28/22 11:42 AM, Dennis Dalessandro wrote:
>>>>>>>>>> On 4/28/22 10:57 AM, Chuck Lever III wrote:
>>>>>>>>>>>> On Apr 28, 2022, at 9:05 AM, Dennis Dalessandro
>>>>>>>>>>>> <dennis.dalessandro@cornelisnetworks.com> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi NFS folks,
>>>>>>>>>>>> 
>>>>>>>>>>>> I've noticed a pretty nasty regression in our NFS
>>>>>>>>>>>> capability
>>>>>>>>>>>> between 5.17 and
>>>>>>>>>>>> 5.18-rc1. I've tried to bisect but not having any
>>>>>>>>>>>> luck. The
>>>>>>>>>>>> problem I'm seeing
>>>>>>>>>>>> is it takes 3 minutes to copy a file from NFS to
>>>>>>>>>>>> the
>>>>>>>>>>>> local
>>>>>>>>>>>> disk. When it should
>>>>>>>>>>>> take less than half a second, which it did up
>>>>>>>>>>>> through
>>>>>>>>>>>> 5.17.
>>>>>>>>>>>> 
>>>>>>>>>>>> It doesn't seem to be network related, but can't
>>>>>>>>>>>> rule
>>>>>>>>>>>> that out
>>>>>>>>>>>> completely.
>>>>>>>>>>>> 
>>>>>>>>>>>> I tried to bisect but the problem can be
>>>>>>>>>>>> intermittent. Some
>>>>>>>>>>>> runs I'll see a
>>>>>>>>>>>> problem in 3 out of 100 cycles, sometimes 0 out
>>>>>>>>>>>> of
>>>>>>>>>>>> 100.
>>>>>>>>>>>> Sometimes I'll see it
>>>>>>>>>>>> 100 out of 100.
>>>>>>>>>>> 
>>>>>>>>>>> It's not clear from your problem report whether the
>>>>>>>>>>> problem
>>>>>>>>>>> appears
>>>>>>>>>>> when it's the server running v5.18-rc or the
>>>>>>>>>>> client.
>>>>>>>>>> 
>>>>>>>>>> That's because I don't know which it is. I'll do a
>>>>>>>>>> quick
>>>>>>>>>> test and
>>>>>>>>>> find out. I
>>>>>>>>>> was testing the same kernel across both nodes.
>>>>>>>>> 
>>>>>>>>> Looks like it is the client.
>>>>>>>>> 
>>>>>>>>> server  client  result
>>>>>>>>> ------  ------  ------
>>>>>>>>> 5.17    5.17    Pass
>>>>>>>>> 5.17    5.18    Fail
>>>>>>>>> 5.18    5.18    Fail
>>>>>>>>> 5.18    5.17    Pass
>>>>>>>>> 
>>>>>>>>> Is there a patch for the client issue you mentioned
>>>>>>>>> that I
>>>>>>>>> could try?
>>>>>>>>> 
>>>>>>>>> -Denny
>>>>>>>> 
>>>>>>>> Try this one
>>>>>>> 
>>>>>>> Thanks for the patch. Unfortunately it doesn't seem to
>>>>>>> solve
>>>>>>> the issue, still
>>>>>>> see intermittent hangs. I applied it on top of -rc4:
>>>>>>> 
>>>>>>> copy /mnt/nfs_test/run_nfs_test.junk to
>>>>>>> /dev/shm/run_nfs_test.tmp...
>>>>>>> 
>>>>>>> real    2m6.072s
>>>>>>> user    0m0.002s
>>>>>>> sys     0m0.263s
>>>>>>> Done
>>>>>>> 
>>>>>>> While it was hung I checked the mem usage on the machine:
>>>>>>> 
>>>>>>> # free -h
>>>>>>>              total        used        free      shared 
>>>>>>> buff/cache   available
>>>>>>> Mem:           62Gi       871Mi        61Gi      
>>>>>>> 342Mi      
>>>>>>> 889Mi        61Gi
>>>>>>> Swap:         4.0Gi          0B       4.0Gi
>>>>>>> 
>>>>>>> Doesn't appear to be under memory pressure.
>>>>>> 
>>>>>> Hi, since you know now that it is the client, perhaps a
>>>>>> bisect
>>>>>> would be more successful?
>>>>> 
>>>>> I've been testing all week. I pulled the nfs-rdma tree that was
>>>>> sent to Linus
>>>>> for 5.18 and tested. I see the problem on pretty much all the
>>>>> patches. However
>>>>> it's the frequency that it hits which changes.
>>>>> 
>>>>> I'll see 1-5 cycles out of 2500 where the copy takes minutes up
>>>>> to:
>>>>> "NFS: Convert readdir page cache to use a cookie based index"
>>>>> 
>>>>> After this I start seeing it around 10 times in 500 and by the
>>>>> last
>>>>> patch 10
>>>>> times in less than 100.
>>>>> 
>>>>> Is there any kind of tracing/debugging I could turn on to get
>>>>> more
>>>>> insight on
>>>>> what is taking so long when it does go bad?
>>>>> 
>>>> 
>>>> Ran a test with -rc6 and this time see a hung task trace on the
>>>> console as well
>>>> as an NFS RPC error.
>>>> 
>>>> [32719.991175] nfs: RPC call returned error 512
>>>> .
>>>> .
>>>> .
>>>> [32933.285126] INFO: task kworker/u145:23:886141 blocked for more
>>>> than 122 seconds.
>>>> [32933.293543]       Tainted: G S                5.18.0-rc6 #1
>>>> [32933.299869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>> disables this
>>>> message.
>>>> [32933.308740] task:kworker/u145:23 state:D stack:    0
>>>> pid:886141
>>>> ppid:     2
>>>> flags:0x00004000
>>>> [32933.318321] Workqueue: rpciod rpc_async_schedule [sunrpc]
>>>> [32933.324524] Call Trace:
>>>> [32933.327347]  <TASK>
>>>> [32933.329785]  __schedule+0x3dd/0x970
>>>> [32933.333783]  schedule+0x41/0xa0
>>>> [32933.337388]  xprt_request_dequeue_xprt+0xd1/0x140 [sunrpc]
>>>> [32933.343639]  ? prepare_to_wait+0xd0/0xd0
>>>> [32933.348123]  ? rpc_destroy_wait_queue+0x10/0x10 [sunrpc]
>>>> [32933.354183]  xprt_release+0x26/0x140 [sunrpc]
>>>> [32933.359168]  ? rpc_destroy_wait_queue+0x10/0x10 [sunrpc]
>>>> [32933.365225]  rpc_release_resources_task+0xe/0x50 [sunrpc]
>>>> [32933.371381]  __rpc_execute+0x2c5/0x4e0 [sunrpc]
>>>> [32933.376564]  ? __switch_to_asm+0x42/0x70
>>>> [32933.381046]  ? finish_task_switch+0xb2/0x2c0
>>>> [32933.385918]  rpc_async_schedule+0x29/0x40 [sunrpc]
>>>> [32933.391391]  process_one_work+0x1c8/0x390
>>>> [32933.395975]  worker_thread+0x30/0x360
>>>> [32933.400162]  ? process_one_work+0x390/0x390
>>>> [32933.404931]  kthread+0xd9/0x100
>>>> [32933.408536]  ? kthread_complete_and_exit+0x20/0x20
>>>> [32933.413984]  ret_from_fork+0x22/0x30
>>>> [32933.418074]  </TASK>
>>>> 
>>>> The call trace shows up again at 245, 368, and 491 seconds. Same
>>>> task, same trace.
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> That's very helpful. The above trace suggests that the RDMA code is
>>> leaking a call to xprt_unpin_rqst().
>> 
>> IMHO this is unlikely to be related to the performance
>> regression -- none of this code has changed in the past 5
>> kernel releases. Could be a different issue, though.
>> 
>> As is often the case in these situations, the INFO trace
>> above happens long after the issue that caused the missing
>> unpin. So... unless Dennis has a reproducer that can trigger
>> the issue frequently, I don't think there's much that can
>> be extracted from that.
>> 
>> Also "nfs: RPC call returned error 512" suggests someone
>> hit ^C at some point. It's always possible that the
>> xprt_rdma_free() path is missing an unpin. But again,
>> that's not likely to be related to performance.
>> 
>> 
>>> From what I can see, rpcrdma_reply_handler() will always call
>>> xprt_pin_rqst(), and that call is required to be matched by a call
>>> to
>>> xprt_unpin_rqst() before the handler exits. However those calls to
>>> xprt_unpin_rqst() appear to be scattered around the code, and it is
>>> not
>>> at all obvious that the requirement is being fulfilled.
>> 
>> I don't think unpin is required before rpcrdma_reply_handler()
>> exits. It /is/ required after xprt_complete_rqst(), and
>> sometimes that call has to be deferred until the NIC is
>> fully done with the RPC's buffers. The NIC is often not
>> finished with the buffers in the Receive completion handler.
>> 
>> 
>>> Chuck, why does that code need to be so complex?
>> 
>> The overall goal is that the upper layer cannot be awoken
>> until the NIC has finished with the RPC's header and
>> payload buffers. These buffers are treated differently
>> depending on how the send path chose to marshal the RPC.
>> 
>> And, all of these are hot paths, so it's important to
>> ensure that we are waiting/context-switching as
>> infrequently as possible. We wake the upper layer waiter
>> during an RDMA completion to reduce context switching.
>> The first completion does not always mean there's nothing
>> left for the NIC to do.
>> 
>> Ignoring the server backchannel, there is a single unpin
>> in the "complete without error" case in
>> rpcrdma_complete_rqst().
>> 
>> There are three call sites for rpcrdma_complete_rqst:
>> 
>> 1. rpcrdma_reply_done - called when no explicit invalidation
>> is needed and the Receive completion fired after the Send
>> completion.
>> 
>> 2. rpcrdma_sendctx_done - called when no explicit invalidation
>> is needed and the Send completion fired after the Receive
>> completion.
>> 
>> 3. frwr_wc_localinv_done - called when explicit invalidation
>> was needed and that invalidation has completed. SQ ordering
>> guarantees that the Send has also completed at this point,
>> so a separate wait for Send completion is not needed.
>> 
>> These are the points when the NIC has finished with all of
>> the memory resources associated with an RPC, so that's
>> when it is safe to wake the waiter and unpin the RPC.
>> 
> 
> From what I can see, what's important in the case of RDMA is to ensure
> that we call exactly 1 of either frwr_unmap_async(), frwr_unmap_sync()
> or rpcrdma_complete_rqst(),

No, all of these mechanisms are supposed to end up at
rpcrdma_complete_rqst(). That's where the RPC/RDMA transport
header is parsed.

Reply completion processing is kicked off by a Receive completion.
If local invalidation is unneeded, then we have to wait explicitly
for the Send completion too. Otherwise, a LOCAL_INV is posted and
we wait for that to complete; that completion means the previous
Send is complete, so waiting for Send completion in that case is
unnecessary.

When all of that dust settles, rpcrdma_complete_rqst() is called
to wake up the upper layer. That's basically how xprtsock did it
when I wrote all this years ago.

(The Send completion is needed because the NIC is free to continue
to read from sndbuf until then. We can't release that buffer until
completion because otherwise a NIC retransmission could expose
the different contents on the wire, or worse, the receiving end
would see garbage. Works a bit like TCP retransmission).


> and that whichever operation we select must
> complete before we call xprt->ops->free_slot().

The NIC has to tell xprtrdma that its memory operations are
complete before RPC can XDR decode either the RPC header or
the NFS request itself. That's well before ->free_slot(), IIRC.


> xprt_pin_rqst()/xprt_unpin_rqst() can help to ensure exclusion between
> those 3 operations, but it is not required if there are other ways to
> do so.
> In fact since xprt_wait_on_pinned_rqst() forces a synchronous sleep in
> xprt_release(), it would be much preferable to find a better solution
> than to pin the request if you concern is performance.

In particular, my performance concern is to avoid waits and
context switches, because those add latency, sometimes quite a
bit. I've tried to build the completion process on only the
unavoidable context switches. I don't immediately see how to
do this by adding wait_on_pinned to the fray.

frwr_unmap_sync() is used only to deal with ^C, so it can be
ugly and/or slow.

I am open to discussion about the structure of this code. The
simpler the better, but there are lots of moving parts. So
feedback is always welcome.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NFS regression between 5.17 and 5.18
  2022-05-13 14:59                   ` Chuck Lever III
  2022-05-13 15:19                     ` Trond Myklebust
@ 2022-05-17 13:40                     ` Dennis Dalessandro
  2022-05-17 14:02                       ` Chuck Lever III
  1 sibling, 1 reply; 25+ messages in thread
From: Dennis Dalessandro @ 2022-05-17 13:40 UTC (permalink / raw)
  To: Chuck Lever III, Trond Myklebust
  Cc: Olga Kornievskaia, Linux NFS Mailing List

On 5/13/22 10:59 AM, Chuck Lever III wrote:
>>>
>>> Ran a test with -rc6 and this time see a hung task trace on the
>>> console as well
>>> as an NFS RPC error.
>>>
>>> [32719.991175] nfs: RPC call returned error 512
>>> .
>>> .
>>> .
>>> [32933.285126] INFO: task kworker/u145:23:886141 blocked for more
>>> than 122 seconds.
>>> [32933.293543]       Tainted: G S                5.18.0-rc6 #1
>>> [32933.299869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this
>>> message.
>>> [32933.308740] task:kworker/u145:23 state:D stack:    0 pid:886141
>>> ppid:     2
>>> flags:0x00004000
>>> [32933.318321] Workqueue: rpciod rpc_async_schedule [sunrpc]
>>> [32933.324524] Call Trace:
>>> [32933.327347]  <TASK>
>>> [32933.329785]  __schedule+0x3dd/0x970
>>> [32933.333783]  schedule+0x41/0xa0
>>> [32933.337388]  xprt_request_dequeue_xprt+0xd1/0x140 [sunrpc]
>>> [32933.343639]  ? prepare_to_wait+0xd0/0xd0
>>> [32933.348123]  ? rpc_destroy_wait_queue+0x10/0x10 [sunrpc]
>>> [32933.354183]  xprt_release+0x26/0x140 [sunrpc]
>>> [32933.359168]  ? rpc_destroy_wait_queue+0x10/0x10 [sunrpc]
>>> [32933.365225]  rpc_release_resources_task+0xe/0x50 [sunrpc]
>>> [32933.371381]  __rpc_execute+0x2c5/0x4e0 [sunrpc]
>>> [32933.376564]  ? __switch_to_asm+0x42/0x70
>>> [32933.381046]  ? finish_task_switch+0xb2/0x2c0
>>> [32933.385918]  rpc_async_schedule+0x29/0x40 [sunrpc]
>>> [32933.391391]  process_one_work+0x1c8/0x390
>>> [32933.395975]  worker_thread+0x30/0x360
>>> [32933.400162]  ? process_one_work+0x390/0x390
>>> [32933.404931]  kthread+0xd9/0x100
>>> [32933.408536]  ? kthread_complete_and_exit+0x20/0x20
>>> [32933.413984]  ret_from_fork+0x22/0x30
>>> [32933.418074]  </TASK>
>>>
>>> The call trace shows up again at 245, 368, and 491 seconds. Same
>>> task, same trace.
>>>
>>>
>>>
>>>
>>
>> That's very helpful. The above trace suggests that the RDMA code is
>> leaking a call to xprt_unpin_rqst().
> 
> IMHO this is unlikely to be related to the performance
> regression -- none of this code has changed in the past 5
> kernel releases. Could be a different issue, though.
> 
> As is often the case in these situations, the INFO trace
> above happens long after the issue that caused the missing
> unpin. So... unless Dennis has a reproducer that can trigger
> the issue frequently, I don't think there's much that can
> be extracted from that.

To be fair, I've only seen this one time and have had the performance regression
since -rc1.

> Also "nfs: RPC call returned error 512" suggests someone
> hit ^C at some point. It's always possible that the
> xprt_rdma_free() path is missing an unpin. But again,
> that's not likely to be related to performance.

I've checked our test code and after 10 minutes it does give up trying to do the
NFS copies and aborts (SIG_INT) the test.

So in all my tests and bisect attempts it seems the possibility to hit a slow
NFS operation that hangs for minutes has been possible for quite some time.
However in 5.18 it gets much worse.

Any likely places I should add traces to try and find out what's stuck or taking
time?

-Denny

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NFS regression between 5.17 and 5.18
  2022-05-17 13:40                     ` Dennis Dalessandro
@ 2022-05-17 14:02                       ` Chuck Lever III
  2022-06-20  7:46                         ` Thorsten Leemhuis
  0 siblings, 1 reply; 25+ messages in thread
From: Chuck Lever III @ 2022-05-17 14:02 UTC (permalink / raw)
  To: Dennis Dalessandro
  Cc: Trond Myklebust, Olga Kornievskaia, Linux NFS Mailing List



> On May 17, 2022, at 9:40 AM, Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> wrote:
> 
> On 5/13/22 10:59 AM, Chuck Lever III wrote:
>>>> 
>>>> Ran a test with -rc6 and this time see a hung task trace on the
>>>> console as well
>>>> as an NFS RPC error.
>>>> 
>>>> [32719.991175] nfs: RPC call returned error 512
>>>> .
>>>> .
>>>> .
>>>> [32933.285126] INFO: task kworker/u145:23:886141 blocked for more
>>>> than 122 seconds.
>>>> [32933.293543]       Tainted: G S                5.18.0-rc6 #1
>>>> [32933.299869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>> disables this
>>>> message.
>>>> [32933.308740] task:kworker/u145:23 state:D stack:    0 pid:886141
>>>> ppid:     2
>>>> flags:0x00004000
>>>> [32933.318321] Workqueue: rpciod rpc_async_schedule [sunrpc]
>>>> [32933.324524] Call Trace:
>>>> [32933.327347]  <TASK>
>>>> [32933.329785]  __schedule+0x3dd/0x970
>>>> [32933.333783]  schedule+0x41/0xa0
>>>> [32933.337388]  xprt_request_dequeue_xprt+0xd1/0x140 [sunrpc]
>>>> [32933.343639]  ? prepare_to_wait+0xd0/0xd0
>>>> [32933.348123]  ? rpc_destroy_wait_queue+0x10/0x10 [sunrpc]
>>>> [32933.354183]  xprt_release+0x26/0x140 [sunrpc]
>>>> [32933.359168]  ? rpc_destroy_wait_queue+0x10/0x10 [sunrpc]
>>>> [32933.365225]  rpc_release_resources_task+0xe/0x50 [sunrpc]
>>>> [32933.371381]  __rpc_execute+0x2c5/0x4e0 [sunrpc]
>>>> [32933.376564]  ? __switch_to_asm+0x42/0x70
>>>> [32933.381046]  ? finish_task_switch+0xb2/0x2c0
>>>> [32933.385918]  rpc_async_schedule+0x29/0x40 [sunrpc]
>>>> [32933.391391]  process_one_work+0x1c8/0x390
>>>> [32933.395975]  worker_thread+0x30/0x360
>>>> [32933.400162]  ? process_one_work+0x390/0x390
>>>> [32933.404931]  kthread+0xd9/0x100
>>>> [32933.408536]  ? kthread_complete_and_exit+0x20/0x20
>>>> [32933.413984]  ret_from_fork+0x22/0x30
>>>> [32933.418074]  </TASK>
>>>> 
>>>> The call trace shows up again at 245, 368, and 491 seconds. Same
>>>> task, same trace.
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> That's very helpful. The above trace suggests that the RDMA code is
>>> leaking a call to xprt_unpin_rqst().
>> 
>> IMHO this is unlikely to be related to the performance
>> regression -- none of this code has changed in the past 5
>> kernel releases. Could be a different issue, though.
>> 
>> As is often the case in these situations, the INFO trace
>> above happens long after the issue that caused the missing
>> unpin. So... unless Dennis has a reproducer that can trigger
>> the issue frequently, I don't think there's much that can
>> be extracted from that.
> 
> To be fair, I've only seen this one time and have had the performance regression
> since -rc1.
> 
>> Also "nfs: RPC call returned error 512" suggests someone
>> hit ^C at some point. It's always possible that the
>> xprt_rdma_free() path is missing an unpin. But again,
>> that's not likely to be related to performance.
> 
> I've checked our test code and after 10 minutes it does give up trying to do the
> NFS copies and aborts (SIG_INT) the test.

After sleeping on it, I'm fairly certain the stack trace
above is a result of a gap in how xprtrdma handles a
signaled RPC.

Signal handling in that code is pretty hard to test, so not
surprising that there's a lingering bug or two. One idea I
had was to add a fault injector in the RPC scheduler to
throw signals at random. I think it can be done without
perturbing the hot path. Maybe I'll post an RFC patch.


> So in all my tests and bisect attempts it seems the possibility to hit a slow
> NFS operation that hangs for minutes has been possible for quite some time.
> However in 5.18 it gets much worse.
> 
> Any likely places I should add traces to try and find out what's stuck or taking
> time?

There's been a lot of churn in that area in recent releases,
so I'm not familiar with the existing tracepoints. Maybe
Ben or Trond could provide some guidance.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NFS regression between 5.17 and 5.18
  2022-05-17 14:02                       ` Chuck Lever III
@ 2022-06-20  7:46                         ` Thorsten Leemhuis
  2022-06-20 14:11                           ` Chuck Lever III
  0 siblings, 1 reply; 25+ messages in thread
From: Thorsten Leemhuis @ 2022-06-20  7:46 UTC (permalink / raw)
  To: Chuck Lever III, Dennis Dalessandro
  Cc: Trond Myklebust, Olga Kornievskaia, Linux NFS Mailing List

Dennis, Chuck, I have below issue on the list of tracked regressions.
What's the status? Has any progress been made? Or is this not really a
regression and can be ignored?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

#regzbot poke
##regzbot unlink: https://bugzilla.kernel.org/show_bug.cgi?id=215890

On 17.05.22 16:02, Chuck Lever III wrote:
>> On May 17, 2022, at 9:40 AM, Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> wrote:
>>
>> On 5/13/22 10:59 AM, Chuck Lever III wrote:
>>>>>
>>>>> Ran a test with -rc6 and this time see a hung task trace on the
>>>>> console as well
>>>>> as an NFS RPC error.
>>>>>
>>>>> [32719.991175] nfs: RPC call returned error 512
>>>>> .
>>>>> .
>>>>> .
>>>>> [32933.285126] INFO: task kworker/u145:23:886141 blocked for more
>>>>> than 122 seconds.
>>>>> [32933.293543]       Tainted: G S                5.18.0-rc6 #1
>>>>> [32933.299869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>> disables this
>>>>> message.
>>>>> [32933.308740] task:kworker/u145:23 state:D stack:    0 pid:886141
>>>>> ppid:     2
>>>>> flags:0x00004000
>>>>> [32933.318321] Workqueue: rpciod rpc_async_schedule [sunrpc]
>>>>> [32933.324524] Call Trace:
>>>>> [32933.327347]  <TASK>
>>>>> [32933.329785]  __schedule+0x3dd/0x970
>>>>> [32933.333783]  schedule+0x41/0xa0
>>>>> [32933.337388]  xprt_request_dequeue_xprt+0xd1/0x140 [sunrpc]
>>>>> [32933.343639]  ? prepare_to_wait+0xd0/0xd0
>>>>> [32933.348123]  ? rpc_destroy_wait_queue+0x10/0x10 [sunrpc]
>>>>> [32933.354183]  xprt_release+0x26/0x140 [sunrpc]
>>>>> [32933.359168]  ? rpc_destroy_wait_queue+0x10/0x10 [sunrpc]
>>>>> [32933.365225]  rpc_release_resources_task+0xe/0x50 [sunrpc]
>>>>> [32933.371381]  __rpc_execute+0x2c5/0x4e0 [sunrpc]
>>>>> [32933.376564]  ? __switch_to_asm+0x42/0x70
>>>>> [32933.381046]  ? finish_task_switch+0xb2/0x2c0
>>>>> [32933.385918]  rpc_async_schedule+0x29/0x40 [sunrpc]
>>>>> [32933.391391]  process_one_work+0x1c8/0x390
>>>>> [32933.395975]  worker_thread+0x30/0x360
>>>>> [32933.400162]  ? process_one_work+0x390/0x390
>>>>> [32933.404931]  kthread+0xd9/0x100
>>>>> [32933.408536]  ? kthread_complete_and_exit+0x20/0x20
>>>>> [32933.413984]  ret_from_fork+0x22/0x30
>>>>> [32933.418074]  </TASK>
>>>>>
>>>>> The call trace shows up again at 245, 368, and 491 seconds. Same
>>>>> task, same trace.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> That's very helpful. The above trace suggests that the RDMA code is
>>>> leaking a call to xprt_unpin_rqst().
>>>
>>> IMHO this is unlikely to be related to the performance
>>> regression -- none of this code has changed in the past 5
>>> kernel releases. Could be a different issue, though.
>>>
>>> As is often the case in these situations, the INFO trace
>>> above happens long after the issue that caused the missing
>>> unpin. So... unless Dennis has a reproducer that can trigger
>>> the issue frequently, I don't think there's much that can
>>> be extracted from that.
>>
>> To be fair, I've only seen this one time and have had the performance regression
>> since -rc1.
>>
>>> Also "nfs: RPC call returned error 512" suggests someone
>>> hit ^C at some point. It's always possible that the
>>> xprt_rdma_free() path is missing an unpin. But again,
>>> that's not likely to be related to performance.
>>
>> I've checked our test code and after 10 minutes it does give up trying to do the
>> NFS copies and aborts (SIG_INT) the test.
> 
> After sleeping on it, I'm fairly certain the stack trace
> above is a result of a gap in how xprtrdma handles a
> signaled RPC.
> 
> Signal handling in that code is pretty hard to test, so not
> surprising that there's a lingering bug or two. One idea I
> had was to add a fault injector in the RPC scheduler to
> throw signals at random. I think it can be done without
> perturbing the hot path. Maybe I'll post an RFC patch.
> 
> 
>> So in all my tests and bisect attempts it seems the possibility to hit a slow
>> NFS operation that hangs for minutes has been possible for quite some time.
>> However in 5.18 it gets much worse.
>>
>> Any likely places I should add traces to try and find out what's stuck or taking
>> time?
> 
> There's been a lot of churn in that area in recent releases,
> so I'm not familiar with the existing tracepoints. Maybe
> Ben or Trond could provide some guidance.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NFS regression between 5.17 and 5.18
  2022-06-20  7:46                         ` Thorsten Leemhuis
@ 2022-06-20 14:11                           ` Chuck Lever III
  2022-06-20 14:29                             ` Thorsten Leemhuis
  0 siblings, 1 reply; 25+ messages in thread
From: Chuck Lever III @ 2022-06-20 14:11 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Dennis Dalessandro, Trond Myklebust, Olga Kornievskaia,
	Linux NFS Mailing List



> On Jun 20, 2022, at 3:46 AM, Thorsten Leemhuis <regressions@leemhuis.info> wrote:
> 
> Dennis, Chuck, I have below issue on the list of tracked regressions.
> What's the status? Has any progress been made? Or is this not really a
> regression and can be ignored?
> 
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> 
> P.S.: As the Linux kernel's regression tracker I deal with a lot of
> reports and sometimes miss something important when writing mails like
> this. If that's the case here, don't hesitate to tell me in a public
> reply, it's in everyone's interest to set the public record straight.
> 
> #regzbot poke
> ##regzbot unlink: https://bugzilla.kernel.org/show_bug.cgi?id=215890

The above link points to an Apple trackpad bug.

The bug described all the way at the bottom was the origin problem
report. I believe this is an NFS client issue. We are waiting for
a response from the NFS client maintainers to help Dennis track
this down.


> On 17.05.22 16:02, Chuck Lever III wrote:
>>> On May 17, 2022, at 9:40 AM, Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> wrote:
>>> 
>>> On 5/13/22 10:59 AM, Chuck Lever III wrote:
>>>>>> 
>>>>>> Ran a test with -rc6 and this time see a hung task trace on the
>>>>>> console as well
>>>>>> as an NFS RPC error.
>>>>>> 
>>>>>> [32719.991175] nfs: RPC call returned error 512
>>>>>> .
>>>>>> .
>>>>>> .
>>>>>> [32933.285126] INFO: task kworker/u145:23:886141 blocked for more
>>>>>> than 122 seconds.
>>>>>> [32933.293543]       Tainted: G S                5.18.0-rc6 #1
>>>>>> [32933.299869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>>> disables this
>>>>>> message.
>>>>>> [32933.308740] task:kworker/u145:23 state:D stack:    0 pid:886141
>>>>>> ppid:     2
>>>>>> flags:0x00004000
>>>>>> [32933.318321] Workqueue: rpciod rpc_async_schedule [sunrpc]
>>>>>> [32933.324524] Call Trace:
>>>>>> [32933.327347]  <TASK>
>>>>>> [32933.329785]  __schedule+0x3dd/0x970
>>>>>> [32933.333783]  schedule+0x41/0xa0
>>>>>> [32933.337388]  xprt_request_dequeue_xprt+0xd1/0x140 [sunrpc]
>>>>>> [32933.343639]  ? prepare_to_wait+0xd0/0xd0
>>>>>> [32933.348123]  ? rpc_destroy_wait_queue+0x10/0x10 [sunrpc]
>>>>>> [32933.354183]  xprt_release+0x26/0x140 [sunrpc]
>>>>>> [32933.359168]  ? rpc_destroy_wait_queue+0x10/0x10 [sunrpc]
>>>>>> [32933.365225]  rpc_release_resources_task+0xe/0x50 [sunrpc]
>>>>>> [32933.371381]  __rpc_execute+0x2c5/0x4e0 [sunrpc]
>>>>>> [32933.376564]  ? __switch_to_asm+0x42/0x70
>>>>>> [32933.381046]  ? finish_task_switch+0xb2/0x2c0
>>>>>> [32933.385918]  rpc_async_schedule+0x29/0x40 [sunrpc]
>>>>>> [32933.391391]  process_one_work+0x1c8/0x390
>>>>>> [32933.395975]  worker_thread+0x30/0x360
>>>>>> [32933.400162]  ? process_one_work+0x390/0x390
>>>>>> [32933.404931]  kthread+0xd9/0x100
>>>>>> [32933.408536]  ? kthread_complete_and_exit+0x20/0x20
>>>>>> [32933.413984]  ret_from_fork+0x22/0x30
>>>>>> [32933.418074]  </TASK>
>>>>>> 
>>>>>> The call trace shows up again at 245, 368, and 491 seconds. Same
>>>>>> task, same trace.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> That's very helpful. The above trace suggests that the RDMA code is
>>>>> leaking a call to xprt_unpin_rqst().
>>>> 
>>>> IMHO this is unlikely to be related to the performance
>>>> regression -- none of this code has changed in the past 5
>>>> kernel releases. Could be a different issue, though.
>>>> 
>>>> As is often the case in these situations, the INFO trace
>>>> above happens long after the issue that caused the missing
>>>> unpin. So... unless Dennis has a reproducer that can trigger
>>>> the issue frequently, I don't think there's much that can
>>>> be extracted from that.
>>> 
>>> To be fair, I've only seen this one time and have had the performance regression
>>> since -rc1.
>>> 
>>>> Also "nfs: RPC call returned error 512" suggests someone
>>>> hit ^C at some point. It's always possible that the
>>>> xprt_rdma_free() path is missing an unpin. But again,
>>>> that's not likely to be related to performance.
>>> 
>>> I've checked our test code and after 10 minutes it does give up trying to do the
>>> NFS copies and aborts (SIG_INT) the test.
>> 
>> After sleeping on it, I'm fairly certain the stack trace
>> above is a result of a gap in how xprtrdma handles a
>> signaled RPC.
>> 
>> Signal handling in that code is pretty hard to test, so not
>> surprising that there's a lingering bug or two. One idea I
>> had was to add a fault injector in the RPC scheduler to
>> throw signals at random. I think it can be done without
>> perturbing the hot path. Maybe I'll post an RFC patch.
>> 
>> 
>>> So in all my tests and bisect attempts it seems the possibility to hit a slow
>>> NFS operation that hangs for minutes has been possible for quite some time.
>>> However in 5.18 it gets much worse.
>>> 
>>> Any likely places I should add traces to try and find out what's stuck or taking
>>> time?
>> 
>> There's been a lot of churn in that area in recent releases,
>> so I'm not familiar with the existing tracepoints. Maybe
>> Ben or Trond could provide some guidance.
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NFS regression between 5.17 and 5.18
  2022-06-20 14:11                           ` Chuck Lever III
@ 2022-06-20 14:29                             ` Thorsten Leemhuis
  2022-06-20 14:40                               ` Chuck Lever III
  0 siblings, 1 reply; 25+ messages in thread
From: Thorsten Leemhuis @ 2022-06-20 14:29 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Dennis Dalessandro, Trond Myklebust, Olga Kornievskaia,
	Linux NFS Mailing List

On 20.06.22 16:11, Chuck Lever III wrote:
> 
> 
>> On Jun 20, 2022, at 3:46 AM, Thorsten Leemhuis <regressions@leemhuis.info> wrote:
>>
>> Dennis, Chuck, I have below issue on the list of tracked regressions.
>> What's the status? Has any progress been made? Or is this not really a
>> regression and can be ignored?
>>
>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>>
>> P.S.: As the Linux kernel's regression tracker I deal with a lot of
>> reports and sometimes miss something important when writing mails like
>> this. If that's the case here, don't hesitate to tell me in a public
>> reply, it's in everyone's interest to set the public record straight.
>>
>> #regzbot poke
>> ##regzbot unlink: https://bugzilla.kernel.org/show_bug.cgi?id=215890
> 
> The above link points to an Apple trackpad bug.

Yeah, I know, sorry, should have mentioned: either I or my bot did
something stupid and associated that report with this regression, that's
why I deassociated it with the "unlink" command.

> The bug described all the way at the bottom was the origin problem
> report. I believe this is an NFS client issue. We are waiting for
> a response from the NFS client maintainers to help Dennis track
> this down.

Many thx for the status update. Can anything be done to speed things up?
This is taken quite a long time already -- way longer that outlined in
"Prioritize work on fixing regressions" here:
https://docs.kernel.org/process/handling-regressions.html

Ciao, Thorsten

>> On 17.05.22 16:02, Chuck Lever III wrote:
>>>> On May 17, 2022, at 9:40 AM, Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> wrote:
>>>>
>>>> On 5/13/22 10:59 AM, Chuck Lever III wrote:
>>>>>>>
>>>>>>> Ran a test with -rc6 and this time see a hung task trace on the
>>>>>>> console as well
>>>>>>> as an NFS RPC error.
>>>>>>>
>>>>>>> [32719.991175] nfs: RPC call returned error 512
>>>>>>> .
>>>>>>> .
>>>>>>> .
>>>>>>> [32933.285126] INFO: task kworker/u145:23:886141 blocked for more
>>>>>>> than 122 seconds.
>>>>>>> [32933.293543]       Tainted: G S                5.18.0-rc6 #1
>>>>>>> [32933.299869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>>>> disables this
>>>>>>> message.
>>>>>>> [32933.308740] task:kworker/u145:23 state:D stack:    0 pid:886141
>>>>>>> ppid:     2
>>>>>>> flags:0x00004000
>>>>>>> [32933.318321] Workqueue: rpciod rpc_async_schedule [sunrpc]
>>>>>>> [32933.324524] Call Trace:
>>>>>>> [32933.327347]  <TASK>
>>>>>>> [32933.329785]  __schedule+0x3dd/0x970
>>>>>>> [32933.333783]  schedule+0x41/0xa0
>>>>>>> [32933.337388]  xprt_request_dequeue_xprt+0xd1/0x140 [sunrpc]
>>>>>>> [32933.343639]  ? prepare_to_wait+0xd0/0xd0
>>>>>>> [32933.348123]  ? rpc_destroy_wait_queue+0x10/0x10 [sunrpc]
>>>>>>> [32933.354183]  xprt_release+0x26/0x140 [sunrpc]
>>>>>>> [32933.359168]  ? rpc_destroy_wait_queue+0x10/0x10 [sunrpc]
>>>>>>> [32933.365225]  rpc_release_resources_task+0xe/0x50 [sunrpc]
>>>>>>> [32933.371381]  __rpc_execute+0x2c5/0x4e0 [sunrpc]
>>>>>>> [32933.376564]  ? __switch_to_asm+0x42/0x70
>>>>>>> [32933.381046]  ? finish_task_switch+0xb2/0x2c0
>>>>>>> [32933.385918]  rpc_async_schedule+0x29/0x40 [sunrpc]
>>>>>>> [32933.391391]  process_one_work+0x1c8/0x390
>>>>>>> [32933.395975]  worker_thread+0x30/0x360
>>>>>>> [32933.400162]  ? process_one_work+0x390/0x390
>>>>>>> [32933.404931]  kthread+0xd9/0x100
>>>>>>> [32933.408536]  ? kthread_complete_and_exit+0x20/0x20
>>>>>>> [32933.413984]  ret_from_fork+0x22/0x30
>>>>>>> [32933.418074]  </TASK>
>>>>>>>
>>>>>>> The call trace shows up again at 245, 368, and 491 seconds. Same
>>>>>>> task, same trace.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> That's very helpful. The above trace suggests that the RDMA code is
>>>>>> leaking a call to xprt_unpin_rqst().
>>>>>
>>>>> IMHO this is unlikely to be related to the performance
>>>>> regression -- none of this code has changed in the past 5
>>>>> kernel releases. Could be a different issue, though.
>>>>>
>>>>> As is often the case in these situations, the INFO trace
>>>>> above happens long after the issue that caused the missing
>>>>> unpin. So... unless Dennis has a reproducer that can trigger
>>>>> the issue frequently, I don't think there's much that can
>>>>> be extracted from that.
>>>>
>>>> To be fair, I've only seen this one time and have had the performance regression
>>>> since -rc1.
>>>>
>>>>> Also "nfs: RPC call returned error 512" suggests someone
>>>>> hit ^C at some point. It's always possible that the
>>>>> xprt_rdma_free() path is missing an unpin. But again,
>>>>> that's not likely to be related to performance.
>>>>
>>>> I've checked our test code and after 10 minutes it does give up trying to do the
>>>> NFS copies and aborts (SIG_INT) the test.
>>>
>>> After sleeping on it, I'm fairly certain the stack trace
>>> above is a result of a gap in how xprtrdma handles a
>>> signaled RPC.
>>>
>>> Signal handling in that code is pretty hard to test, so not
>>> surprising that there's a lingering bug or two. One idea I
>>> had was to add a fault injector in the RPC scheduler to
>>> throw signals at random. I think it can be done without
>>> perturbing the hot path. Maybe I'll post an RFC patch.
>>>
>>>
>>>> So in all my tests and bisect attempts it seems the possibility to hit a slow
>>>> NFS operation that hangs for minutes has been possible for quite some time.
>>>> However in 5.18 it gets much worse.
>>>>
>>>> Any likely places I should add traces to try and find out what's stuck or taking
>>>> time?
>>>
>>> There's been a lot of churn in that area in recent releases,
>>> so I'm not familiar with the existing tracepoints. Maybe
>>> Ben or Trond could provide some guidance.
>>
> 
> --
> Chuck Lever
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NFS regression between 5.17 and 5.18
  2022-06-20 14:29                             ` Thorsten Leemhuis
@ 2022-06-20 14:40                               ` Chuck Lever III
  2022-06-20 17:06                                 ` Dennis Dalessandro
  0 siblings, 1 reply; 25+ messages in thread
From: Chuck Lever III @ 2022-06-20 14:40 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Dennis Dalessandro, Trond Myklebust, Olga Kornievskaia,
	Linux NFS Mailing List

Hi Thorsten-

> On Jun 20, 2022, at 10:29 AM, Thorsten Leemhuis <regressions@leemhuis.info> wrote:
> 
> On 20.06.22 16:11, Chuck Lever III wrote:
>> 
>> 
>>> On Jun 20, 2022, at 3:46 AM, Thorsten Leemhuis <regressions@leemhuis.info> wrote:
>>> 
>>> Dennis, Chuck, I have below issue on the list of tracked regressions.
>>> What's the status? Has any progress been made? Or is this not really a
>>> regression and can be ignored?
>>> 
>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>>> 
>>> P.S.: As the Linux kernel's regression tracker I deal with a lot of
>>> reports and sometimes miss something important when writing mails like
>>> this. If that's the case here, don't hesitate to tell me in a public
>>> reply, it's in everyone's interest to set the public record straight.
>>> 
>>> #regzbot poke
>>> ##regzbot unlink: https://bugzilla.kernel.org/show_bug.cgi?id=215890
>> 
>> The above link points to an Apple trackpad bug.
> 
> Yeah, I know, sorry, should have mentioned: either I or my bot did
> something stupid and associated that report with this regression, that's
> why I deassociated it with the "unlink" command.

Is there an open bugzilla for the original regression?


>> The bug described all the way at the bottom was the origin problem
>> report. I believe this is an NFS client issue. We are waiting for
>> a response from the NFS client maintainers to help Dennis track
>> this down.
> 
> Many thx for the status update. Can anything be done to speed things up?
> This is taken quite a long time already -- way longer that outlined in
> "Prioritize work on fixing regressions" here:
> https://docs.kernel.org/process/handling-regressions.html

ENOTMYMONKEYS ;-)

I was involved to help with the ^C issue that happened while
Dennis was troubleshooting. It's not related to the original
regression, which needs to be pursued by the NFS client
maintainers.

The correct people to poke are Trond, Olga (both cc'd) and
Anna Schumaker.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NFS regression between 5.17 and 5.18
  2022-06-20 14:40                               ` Chuck Lever III
@ 2022-06-20 17:06                                 ` Dennis Dalessandro
  2022-06-21 16:04                                   ` Olga Kornievskaia
  0 siblings, 1 reply; 25+ messages in thread
From: Dennis Dalessandro @ 2022-06-20 17:06 UTC (permalink / raw)
  To: Chuck Lever III, Thorsten Leemhuis
  Cc: Trond Myklebust, Olga Kornievskaia, Linux NFS Mailing List

On 6/20/22 10:40 AM, Chuck Lever III wrote:
> Hi Thorsten-
> 
>> On Jun 20, 2022, at 10:29 AM, Thorsten Leemhuis <regressions@leemhuis.info> wrote:
>>
>> On 20.06.22 16:11, Chuck Lever III wrote:
>>>
>>>
>>>> On Jun 20, 2022, at 3:46 AM, Thorsten Leemhuis <regressions@leemhuis.info> wrote:
>>>>
>>>> Dennis, Chuck, I have below issue on the list of tracked regressions.
>>>> What's the status? Has any progress been made? Or is this not really a
>>>> regression and can be ignored?
>>>>
>>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>>>>
>>>> P.S.: As the Linux kernel's regression tracker I deal with a lot of
>>>> reports and sometimes miss something important when writing mails like
>>>> this. If that's the case here, don't hesitate to tell me in a public
>>>> reply, it's in everyone's interest to set the public record straight.
>>>>
>>>> #regzbot poke
>>>> ##regzbot unlink: https://bugzilla.kernel.org/show_bug.cgi?id=215890
>>>
>>> The above link points to an Apple trackpad bug.
>>
>> Yeah, I know, sorry, should have mentioned: either I or my bot did
>> something stupid and associated that report with this regression, that's
>> why I deassociated it with the "unlink" command.
> 
> Is there an open bugzilla for the original regression?
> 
> 
>>> The bug described all the way at the bottom was the origin problem
>>> report. I believe this is an NFS client issue. We are waiting for
>>> a response from the NFS client maintainers to help Dennis track
>>> this down.
>>
>> Many thx for the status update. Can anything be done to speed things up?
>> This is taken quite a long time already -- way longer that outlined in
>> "Prioritize work on fixing regressions" here:
>> https://docs.kernel.org/process/handling-regressions.html
> 
> ENOTMYMONKEYS ;-)
> 
> I was involved to help with the ^C issue that happened while
> Dennis was troubleshooting. It's not related to the original
> regression, which needs to be pursued by the NFS client
> maintainers.
> 
> The correct people to poke are Trond, Olga (both cc'd) and
> Anna Schumaker.

Perhaps I should open a bugzilla for the regression. The Ctrl+C issue was a
result of the test we were running taking too long. It times out after 10
minutes or so and kills the process. So a downstream effect of the regression.

The test is still continuing to fail as of 5.19-rc2. I'll double check that it's
the same issue and open a bugzilla.

Thanks for poking at this.

-Denny

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NFS regression between 5.17 and 5.18
  2022-06-20 17:06                                 ` Dennis Dalessandro
@ 2022-06-21 16:04                                   ` Olga Kornievskaia
  2022-06-21 16:58                                     ` Dennis Dalessandro
  0 siblings, 1 reply; 25+ messages in thread
From: Olga Kornievskaia @ 2022-06-21 16:04 UTC (permalink / raw)
  To: Dennis Dalessandro
  Cc: Chuck Lever III, Thorsten Leemhuis, Trond Myklebust,
	Linux NFS Mailing List

Hi Dennis,

Can I ask some basic questions? Have you tried to get any kinds of
profiling done to see where the client is spending time (using perf
perhaps)?

real    4m11.835s
user    0m0.001s
sys     0m0.277s

sounds like 4ms are spent sleeping somewhere? Did it take 4mins to do
a network transfer (if we had a network trace we could see how long
network transfer were)? Do you have one (that goes along with
something that can tell us approximately when the request began from
the cp's perspective, like a date before hand)?

I see that there were no rdma changes that went into 5.18 kernel so
whatever changed either a generic nfs behaviour or perhaps something
in the rdma core code (is an mellonax card being used here?)

I wonder if the slowdown only happens on rdma or is it visible on the
tcp mount as well, have you tried?



On Mon, Jun 20, 2022 at 1:06 PM Dennis Dalessandro
<dennis.dalessandro@cornelisnetworks.com> wrote:
>
> On 6/20/22 10:40 AM, Chuck Lever III wrote:
> > Hi Thorsten-
> >
> >> On Jun 20, 2022, at 10:29 AM, Thorsten Leemhuis <regressions@leemhuis.info> wrote:
> >>
> >> On 20.06.22 16:11, Chuck Lever III wrote:
> >>>
> >>>
> >>>> On Jun 20, 2022, at 3:46 AM, Thorsten Leemhuis <regressions@leemhuis.info> wrote:
> >>>>
> >>>> Dennis, Chuck, I have below issue on the list of tracked regressions.
> >>>> What's the status? Has any progress been made? Or is this not really a
> >>>> regression and can be ignored?
> >>>>
> >>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> >>>>
> >>>> P.S.: As the Linux kernel's regression tracker I deal with a lot of
> >>>> reports and sometimes miss something important when writing mails like
> >>>> this. If that's the case here, don't hesitate to tell me in a public
> >>>> reply, it's in everyone's interest to set the public record straight.
> >>>>
> >>>> #regzbot poke
> >>>> ##regzbot unlink: https://bugzilla.kernel.org/show_bug.cgi?id=215890
> >>>
> >>> The above link points to an Apple trackpad bug.
> >>
> >> Yeah, I know, sorry, should have mentioned: either I or my bot did
> >> something stupid and associated that report with this regression, that's
> >> why I deassociated it with the "unlink" command.
> >
> > Is there an open bugzilla for the original regression?
> >
> >
> >>> The bug described all the way at the bottom was the origin problem
> >>> report. I believe this is an NFS client issue. We are waiting for
> >>> a response from the NFS client maintainers to help Dennis track
> >>> this down.
> >>
> >> Many thx for the status update. Can anything be done to speed things up?
> >> This is taken quite a long time already -- way longer that outlined in
> >> "Prioritize work on fixing regressions" here:
> >> https://docs.kernel.org/process/handling-regressions.html
> >
> > ENOTMYMONKEYS ;-)
> >
> > I was involved to help with the ^C issue that happened while
> > Dennis was troubleshooting. It's not related to the original
> > regression, which needs to be pursued by the NFS client
> > maintainers.
> >
> > The correct people to poke are Trond, Olga (both cc'd) and
> > Anna Schumaker.
>
> Perhaps I should open a bugzilla for the regression. The Ctrl+C issue was a
> result of the test we were running taking too long. It times out after 10
> minutes or so and kills the process. So a downstream effect of the regression.
>
> The test is still continuing to fail as of 5.19-rc2. I'll double check that it's
> the same issue and open a bugzilla.
>
> Thanks for poking at this.
>
> -Denny

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NFS regression between 5.17 and 5.18
  2022-06-21 16:04                                   ` Olga Kornievskaia
@ 2022-06-21 16:58                                     ` Dennis Dalessandro
  2022-06-21 17:51                                       ` Olga Kornievskaia
  0 siblings, 1 reply; 25+ messages in thread
From: Dennis Dalessandro @ 2022-06-21 16:58 UTC (permalink / raw)
  To: Olga Kornievskaia
  Cc: Chuck Lever III, Thorsten Leemhuis, Trond Myklebust,
	Linux NFS Mailing List

On 6/21/22 12:04 PM, Olga Kornievskaia wrote:
> Hi Dennis,
> 
> Can I ask some basic questions? Have you tried to get any kinds of
> profiling done to see where the client is spending time (using perf
> perhaps)?
> 
> real    4m11.835s
> user    0m0.001s
> sys     0m0.277s
> 
> sounds like 4ms are spent sleeping somewhere? Did it take 4mins to do
> a network transfer (if we had a network trace we could see how long
> network transfer were)? Do you have one (that goes along with
> something that can tell us approximately when the request began from
> the cp's perspective, like a date before hand)?
> 
> I see that there were no rdma changes that went into 5.18 kernel so
> whatever changed either a generic nfs behaviour or perhaps something
> in the rdma core code (is an mellonax card being used here?)
> 
> I wonder if the slowdown only happens on rdma or is it visible on the
> tcp mount as well, have you tried?
> 

Hi Olga,

I have opened a Kernel Bugzilla if you would rather log future responses there:
https://bugzilla.kernel.org/show_bug.cgi?id=216160

To answer your above questions: This is on Omni-Path hardware. I have not tried
the TCP mount, I can though. I don't have any network trace per-se or a profile.
We don't support like a TCP dump or anything like that. However I can tell you
there is nothing going over the network while it appears to be hung. I can
monitor the packet counters.

If you have some ideas where I could put some trace points that could tell us
something I can certainly add those.

-Denny

> 
> 
> On Mon, Jun 20, 2022 at 1:06 PM Dennis Dalessandro
> <dennis.dalessandro@cornelisnetworks.com> wrote:
>>
>> On 6/20/22 10:40 AM, Chuck Lever III wrote:
>>> Hi Thorsten-
>>>
>>>> On Jun 20, 2022, at 10:29 AM, Thorsten Leemhuis <regressions@leemhuis.info> wrote:
>>>>
>>>> On 20.06.22 16:11, Chuck Lever III wrote:
>>>>>
>>>>>
>>>>>> On Jun 20, 2022, at 3:46 AM, Thorsten Leemhuis <regressions@leemhuis.info> wrote:
>>>>>>
>>>>>> Dennis, Chuck, I have below issue on the list of tracked regressions.
>>>>>> What's the status? Has any progress been made? Or is this not really a
>>>>>> regression and can be ignored?
>>>>>>
>>>>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>>>>>>
>>>>>> P.S.: As the Linux kernel's regression tracker I deal with a lot of
>>>>>> reports and sometimes miss something important when writing mails like
>>>>>> this. If that's the case here, don't hesitate to tell me in a public
>>>>>> reply, it's in everyone's interest to set the public record straight.
>>>>>>
>>>>>> #regzbot poke
>>>>>> ##regzbot unlink: https://bugzilla.kernel.org/show_bug.cgi?id=215890
>>>>>
>>>>> The above link points to an Apple trackpad bug.
>>>>
>>>> Yeah, I know, sorry, should have mentioned: either I or my bot did
>>>> something stupid and associated that report with this regression, that's
>>>> why I deassociated it with the "unlink" command.
>>>
>>> Is there an open bugzilla for the original regression?
>>>
>>>
>>>>> The bug described all the way at the bottom was the origin problem
>>>>> report. I believe this is an NFS client issue. We are waiting for
>>>>> a response from the NFS client maintainers to help Dennis track
>>>>> this down.
>>>>
>>>> Many thx for the status update. Can anything be done to speed things up?
>>>> This is taken quite a long time already -- way longer that outlined in
>>>> "Prioritize work on fixing regressions" here:
>>>> https://docs.kernel.org/process/handling-regressions.html
>>>
>>> ENOTMYMONKEYS ;-)
>>>
>>> I was involved to help with the ^C issue that happened while
>>> Dennis was troubleshooting. It's not related to the original
>>> regression, which needs to be pursued by the NFS client
>>> maintainers.
>>>
>>> The correct people to poke are Trond, Olga (both cc'd) and
>>> Anna Schumaker.
>>
>> Perhaps I should open a bugzilla for the regression. The Ctrl+C issue was a
>> result of the test we were running taking too long. It times out after 10
>> minutes or so and kills the process. So a downstream effect of the regression.
>>
>> The test is still continuing to fail as of 5.19-rc2. I'll double check that it's
>> the same issue and open a bugzilla.
>>
>> Thanks for poking at this.
>>
>> -Denny

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NFS regression between 5.17 and 5.18
  2022-06-21 16:58                                     ` Dennis Dalessandro
@ 2022-06-21 17:51                                       ` Olga Kornievskaia
  2022-06-21 17:53                                         ` Chuck Lever III
  0 siblings, 1 reply; 25+ messages in thread
From: Olga Kornievskaia @ 2022-06-21 17:51 UTC (permalink / raw)
  To: Dennis Dalessandro
  Cc: Chuck Lever III, Thorsten Leemhuis, Trond Myklebust,
	Linux NFS Mailing List

On Tue, Jun 21, 2022 at 12:58 PM Dennis Dalessandro
<dennis.dalessandro@cornelisnetworks.com> wrote:
>
> On 6/21/22 12:04 PM, Olga Kornievskaia wrote:
> > Hi Dennis,
> >
> > Can I ask some basic questions? Have you tried to get any kinds of
> > profiling done to see where the client is spending time (using perf
> > perhaps)?
> >
> > real    4m11.835s
> > user    0m0.001s
> > sys     0m0.277s
> >
> > sounds like 4ms are spent sleeping somewhere? Did it take 4mins to do
> > a network transfer (if we had a network trace we could see how long
> > network transfer were)? Do you have one (that goes along with
> > something that can tell us approximately when the request began from
> > the cp's perspective, like a date before hand)?
> >
> > I see that there were no rdma changes that went into 5.18 kernel so
> > whatever changed either a generic nfs behaviour or perhaps something
> > in the rdma core code (is an mellonax card being used here?)
> >
> > I wonder if the slowdown only happens on rdma or is it visible on the
> > tcp mount as well, have you tried?
> >
>
> Hi Olga,
>
> I have opened a Kernel Bugzilla if you would rather log future responses there:
> https://bugzilla.kernel.org/show_bug.cgi?id=216160
>
> To answer your above questions: This is on Omni-Path hardware. I have not tried
> the TCP mount, I can though. I don't have any network trace per-se or a profile.
> We don't support like a TCP dump or anything like that. However I can tell you
> there is nothing going over the network while it appears to be hung. I can
> monitor the packet counters.

In this thread there are 2 problems raised (1) performance regression
and (2) a single time run that hit a hung client.

For #1, given that there were no rdma changes that were added to the
5.18, seems like something in the generic nfs is causing issues for
you, thus I recommend to first use a linux profiler to get some
information about the times spent in kernel functions that are
triggered by the cp command. If you can't run the profiler (which I
think you should be able to), then perhaps just enable the nfs4 and
rpcrdma tracepoints which also have timestamps in them and looking at
differences can give some clue where the time is being spent. 4min is
a significant chunk of time and should be visible somewhere in those
timestamps.

For #2, I have personally ran into that stack trace while
investigating a hung using soft iWarp as the rdma provider. It was an
unpinned request but I think it was due to soft iWarp failure that
cause it not to do a completion to rdma which led to the request never
getting unpinned. Thus I would recommend looking into failures in your
rdma provider for clues on that problem.

> If you have some ideas where I could put some trace points that could tell us
> something I can certainly add those.
>
> -Denny
>
> >
> >
> > On Mon, Jun 20, 2022 at 1:06 PM Dennis Dalessandro
> > <dennis.dalessandro@cornelisnetworks.com> wrote:
> >>
> >> On 6/20/22 10:40 AM, Chuck Lever III wrote:
> >>> Hi Thorsten-
> >>>
> >>>> On Jun 20, 2022, at 10:29 AM, Thorsten Leemhuis <regressions@leemhuis.info> wrote:
> >>>>
> >>>> On 20.06.22 16:11, Chuck Lever III wrote:
> >>>>>
> >>>>>
> >>>>>> On Jun 20, 2022, at 3:46 AM, Thorsten Leemhuis <regressions@leemhuis.info> wrote:
> >>>>>>
> >>>>>> Dennis, Chuck, I have below issue on the list of tracked regressions.
> >>>>>> What's the status? Has any progress been made? Or is this not really a
> >>>>>> regression and can be ignored?
> >>>>>>
> >>>>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> >>>>>>
> >>>>>> P.S.: As the Linux kernel's regression tracker I deal with a lot of
> >>>>>> reports and sometimes miss something important when writing mails like
> >>>>>> this. If that's the case here, don't hesitate to tell me in a public
> >>>>>> reply, it's in everyone's interest to set the public record straight.
> >>>>>>
> >>>>>> #regzbot poke
> >>>>>> ##regzbot unlink: https://bugzilla.kernel.org/show_bug.cgi?id=215890
> >>>>>
> >>>>> The above link points to an Apple trackpad bug.
> >>>>
> >>>> Yeah, I know, sorry, should have mentioned: either I or my bot did
> >>>> something stupid and associated that report with this regression, that's
> >>>> why I deassociated it with the "unlink" command.
> >>>
> >>> Is there an open bugzilla for the original regression?
> >>>
> >>>
> >>>>> The bug described all the way at the bottom was the origin problem
> >>>>> report. I believe this is an NFS client issue. We are waiting for
> >>>>> a response from the NFS client maintainers to help Dennis track
> >>>>> this down.
> >>>>
> >>>> Many thx for the status update. Can anything be done to speed things up?
> >>>> This is taken quite a long time already -- way longer that outlined in
> >>>> "Prioritize work on fixing regressions" here:
> >>>> https://docs.kernel.org/process/handling-regressions.html
> >>>
> >>> ENOTMYMONKEYS ;-)
> >>>
> >>> I was involved to help with the ^C issue that happened while
> >>> Dennis was troubleshooting. It's not related to the original
> >>> regression, which needs to be pursued by the NFS client
> >>> maintainers.
> >>>
> >>> The correct people to poke are Trond, Olga (both cc'd) and
> >>> Anna Schumaker.
> >>
> >> Perhaps I should open a bugzilla for the regression. The Ctrl+C issue was a
> >> result of the test we were running taking too long. It times out after 10
> >> minutes or so and kills the process. So a downstream effect of the regression.
> >>
> >> The test is still continuing to fail as of 5.19-rc2. I'll double check that it's
> >> the same issue and open a bugzilla.
> >>
> >> Thanks for poking at this.
> >>
> >> -Denny

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: NFS regression between 5.17 and 5.18
  2022-06-21 17:51                                       ` Olga Kornievskaia
@ 2022-06-21 17:53                                         ` Chuck Lever III
  0 siblings, 0 replies; 25+ messages in thread
From: Chuck Lever III @ 2022-06-21 17:53 UTC (permalink / raw)
  To: Olga Kornievskaia
  Cc: Dennis Dalessandro, Thorsten Leemhuis, Trond Myklebust,
	Linux NFS Mailing List



> On Jun 21, 2022, at 1:51 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
> 
> For #2, I have personally ran into that stack trace while
> investigating a hung using soft iWarp as the rdma provider. It was an
> unpinned request but I think it was due to soft iWarp failure that
> cause it not to do a completion to rdma which led to the request never
> getting unpinned. Thus I would recommend looking into failures in your
> rdma provider for clues on that problem.

#2 is a separate problem entirely, and is a known bug. I plan to
add some fault injection so we can reproduce and track it down.
I think you should prune this issue and focus just on #1 for now.

--
Chuck Lever




^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-06-21 17:54 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-28 13:05 NFS regression between 5.17 and 5.18 Dennis Dalessandro
2022-04-28 14:57 ` Chuck Lever III
2022-04-28 15:42   ` Dennis Dalessandro
2022-04-28 19:47     ` Dennis Dalessandro
2022-04-28 19:56       ` Trond Myklebust
2022-04-29 12:54         ` Dennis Dalessandro
2022-04-29 13:37           ` Chuck Lever III
2022-05-06 13:24             ` Dennis Dalessandro
2022-05-13 11:58               ` Dennis Dalessandro
2022-05-13 13:30                 ` Trond Myklebust
2022-05-13 14:59                   ` Chuck Lever III
2022-05-13 15:19                     ` Trond Myklebust
2022-05-13 18:53                       ` Chuck Lever III
2022-05-17 13:40                     ` Dennis Dalessandro
2022-05-17 14:02                       ` Chuck Lever III
2022-06-20  7:46                         ` Thorsten Leemhuis
2022-06-20 14:11                           ` Chuck Lever III
2022-06-20 14:29                             ` Thorsten Leemhuis
2022-06-20 14:40                               ` Chuck Lever III
2022-06-20 17:06                                 ` Dennis Dalessandro
2022-06-21 16:04                                   ` Olga Kornievskaia
2022-06-21 16:58                                     ` Dennis Dalessandro
2022-06-21 17:51                                       ` Olga Kornievskaia
2022-06-21 17:53                                         ` Chuck Lever III
2022-05-04  9:45 ` Thorsten Leemhuis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.