All of lore.kernel.org
 help / color / mirror / Atom feed
* NFS server regression in kernel 5.13 (tested w/ 5.13.9)
@ 2021-08-08 22:37 Mike Javorski
  2021-08-08 22:47 ` Chuck Lever III
  2021-08-09  0:01 ` NeilBrown
  0 siblings, 2 replies; 47+ messages in thread
From: Mike Javorski @ 2021-08-08 22:37 UTC (permalink / raw)
  To: linux-nfs

I have been experiencing nfs file access hangs with multiple release
versions of the 5.13.x linux kernel. In each case, all file transfers
freeze for 5-10 seconds and then resume. This seems worse when reading
through many files sequentially.

My server:
- Archlinux w/ a distribution provided kernel package
- filesystems exported with "rw,sync,no_subtree_check,insecure" options

Client:
- Archlinux w/ latest distribution provided kernel (5.13.9-arch1-1 at writing)
- nfs mounted via /net autofs with "soft,nodev,nosuid" options
(ver=4.2 is indicated in mount)

I have tried the 5.13.x kernel several times since the first arch
release (most recently with 5.13.9-arch1-1), all with similar results.
Each time, I am forced to downgrade the linux package to a 5.12.x
kernel (5.12.15-arch1 as of writing) to clear up the transfer issues
and stabilize performance. No other changes are made between tests. I
have confirmed the freezing behavior using both ext4 and btrfs
filesystems exported from this server.

At this point I would appreciate some guidance in what to provide in
order to diagnose and resolve this issue. I don't have a lot of kernel
debugging experience, so instruction would be helpful.

- mike

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-08 22:37 NFS server regression in kernel 5.13 (tested w/ 5.13.9) Mike Javorski
@ 2021-08-08 22:47 ` Chuck Lever III
  2021-08-08 23:23   ` Mike Javorski
  2021-08-09  0:01 ` NeilBrown
  1 sibling, 1 reply; 47+ messages in thread
From: Chuck Lever III @ 2021-08-08 22:47 UTC (permalink / raw)
  To: Mike Javorski; +Cc: Linux NFS Mailing List



> On Aug 8, 2021, at 6:37 PM, Mike Javorski <mike.javorski@gmail.com> wrote:
> 
> I have been experiencing nfs file access hangs with multiple release
> versions of the 5.13.x linux kernel. In each case, all file transfers
> freeze for 5-10 seconds and then resume. This seems worse when reading
> through many files sequentially.
> 
> My server:
> - Archlinux w/ a distribution provided kernel package
> - filesystems exported with "rw,sync,no_subtree_check,insecure" options
> 
> Client:
> - Archlinux w/ latest distribution provided kernel (5.13.9-arch1-1 at writing)
> - nfs mounted via /net autofs with "soft,nodev,nosuid" options
> (ver=4.2 is indicated in mount)
> 
> I have tried the 5.13.x kernel several times since the first arch
> release (most recently with 5.13.9-arch1-1), all with similar results.
> Each time, I am forced to downgrade the linux package to a 5.12.x
> kernel (5.12.15-arch1 as of writing) to clear up the transfer issues
> and stabilize performance. No other changes are made between tests. I
> have confirmed the freezing behavior using both ext4 and btrfs
> filesystems exported from this server.
> 
> At this point I would appreciate some guidance in what to provide in
> order to diagnose and resolve this issue. I don't have a lot of kernel
> debugging experience, so instruction would be helpful.

Hi Mike-

Thanks for the report.

Since you are using a distribution kernel and don't have much
kernel debugging experience, we typically ask you to work with
your distributor first. linux-nfs@ is a developer's mailing
list, we're not really prepared to provide user support.

When you report the problem to Arch, you might want to have
a description of your workload (especially if you can
reproduce the problem often with a particular application
or user activity), and maybe be prepared to capture a network
trace of the failure using tcpdump.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-08 22:47 ` Chuck Lever III
@ 2021-08-08 23:23   ` Mike Javorski
  0 siblings, 0 replies; 47+ messages in thread
From: Mike Javorski @ 2021-08-08 23:23 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Linux NFS Mailing List

I appreciate that Chuck. It was not my intention to seek basic "user"
support, but as I don't know the intricacies of how nfs works in the
kernel, I don't know where to get the right information to share to
help with the diagnostic effort. I have read through the
troubleshooting recommendations on the linux-nfs wiki, but the issue
is beyond what is indicated there, and my setup has been running
without any NFS issues for years at this point, and I can demonstrate
the error merely by changing the booted kernel, so I feel confident
it's not a simple configuration error.

I have raised an issue on the ArchLinux bug tracker as well
(https://bugs.archlinux.org/task/71775). Given that the ArchLinux
version of the kernel has minimal delta from mainline, (I've checked
and no fs/rpc/nfs files are changed in 7 file delta vs mainline), it's
been my previous experience that they will refer me to the kernel
subsystem mailing list, but I will wait to hear back from them, and
see what they say.

- mike


- mike









On Sun, Aug 8, 2021 at 3:47 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
>
> > On Aug 8, 2021, at 6:37 PM, Mike Javorski <mike.javorski@gmail.com> wrote:
> >
> > I have been experiencing nfs file access hangs with multiple release
> > versions of the 5.13.x linux kernel. In each case, all file transfers
> > freeze for 5-10 seconds and then resume. This seems worse when reading
> > through many files sequentially.
> >
> > My server:
> > - Archlinux w/ a distribution provided kernel package
> > - filesystems exported with "rw,sync,no_subtree_check,insecure" options
> >
> > Client:
> > - Archlinux w/ latest distribution provided kernel (5.13.9-arch1-1 at writing)
> > - nfs mounted via /net autofs with "soft,nodev,nosuid" options
> > (ver=4.2 is indicated in mount)
> >
> > I have tried the 5.13.x kernel several times since the first arch
> > release (most recently with 5.13.9-arch1-1), all with similar results.
> > Each time, I am forced to downgrade the linux package to a 5.12.x
> > kernel (5.12.15-arch1 as of writing) to clear up the transfer issues
> > and stabilize performance. No other changes are made between tests. I
> > have confirmed the freezing behavior using both ext4 and btrfs
> > filesystems exported from this server.
> >
> > At this point I would appreciate some guidance in what to provide in
> > order to diagnose and resolve this issue. I don't have a lot of kernel
> > debugging experience, so instruction would be helpful.
>
> Hi Mike-
>
> Thanks for the report.
>
> Since you are using a distribution kernel and don't have much
> kernel debugging experience, we typically ask you to work with
> your distributor first. linux-nfs@ is a developer's mailing
> list, we're not really prepared to provide user support.
>
> When you report the problem to Arch, you might want to have
> a description of your workload (especially if you can
> reproduce the problem often with a particular application
> or user activity), and maybe be prepared to capture a network
> trace of the failure using tcpdump.
>
>
> --
> Chuck Lever
>
>
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-08 22:37 NFS server regression in kernel 5.13 (tested w/ 5.13.9) Mike Javorski
  2021-08-08 22:47 ` Chuck Lever III
@ 2021-08-09  0:01 ` NeilBrown
  2021-08-09  0:28   ` Mike Javorski
  1 sibling, 1 reply; 47+ messages in thread
From: NeilBrown @ 2021-08-09  0:01 UTC (permalink / raw)
  To: Mike Javorski; +Cc: linux-nfs

On Mon, 09 Aug 2021, Mike Javorski wrote:
> I have been experiencing nfs file access hangs with multiple release
> versions of the 5.13.x linux kernel. In each case, all file transfers
> freeze for 5-10 seconds and then resume. This seems worse when reading
> through many files sequentially.

A particularly useful debugging tool for NFS freezes is to run

  rpcdebug -m rpc -c all

while the system appears frozen.  As you only have a 5-10 second window
this might be tricky.
Setting or clearing debug flags in the rpc module (whether they are
already set or not) has a side effect if listing all RPC "tasks" which a
waiting for a reply.  Seeing that task list can often be useful.

The task list appears in "dmesg" output.  If there are not tasks
waiting, nothing will be written which might lead you to think it didn't
work.

As Chuck hinted, tcpdump is invaluable for this sort of problem.
  tcpdump -s 0 -w /tmp/somefile.pcap port 2049

will capture NFS traffic.  If this can start before a hang, and finish
after, it may contain useful information.  Doing that in a way that
doesn't create an enormous file might be a challenge.  It would help if
you found a way trigger the problem.  Take note of the circumstances
when it seems to happen the most.  If you can only produce a large file,
we can probably still work with it.
  tshark -r /tmp/somefile.pcap
will report the capture one line per packet.  You can look for the
appropriate timestamp, note the frame numbers, and use "editcap"
to extract a suitable range of packets.

NeilBrown


> 
> My server:
> - Archlinux w/ a distribution provided kernel package
> - filesystems exported with "rw,sync,no_subtree_check,insecure" options
> 
> Client:
> - Archlinux w/ latest distribution provided kernel (5.13.9-arch1-1 at writing)
> - nfs mounted via /net autofs with "soft,nodev,nosuid" options
> (ver=4.2 is indicated in mount)
> 
> I have tried the 5.13.x kernel several times since the first arch
> release (most recently with 5.13.9-arch1-1), all with similar results.
> Each time, I am forced to downgrade the linux package to a 5.12.x
> kernel (5.12.15-arch1 as of writing) to clear up the transfer issues
> and stabilize performance. No other changes are made between tests. I
> have confirmed the freezing behavior using both ext4 and btrfs
> filesystems exported from this server.
> 
> At this point I would appreciate some guidance in what to provide in
> order to diagnose and resolve this issue. I don't have a lot of kernel
> debugging experience, so instruction would be helpful.
> 
> - mike
> 
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-09  0:01 ` NeilBrown
@ 2021-08-09  0:28   ` Mike Javorski
  2021-08-10  0:50     ` Mike Javorski
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Javorski @ 2021-08-09  0:28 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-nfs

Appreciate the insights Neil. I will make an attempt to
trigger/capture the fault if I can.

- mike

On Sun, Aug 8, 2021 at 5:01 PM NeilBrown <neilb@suse.de> wrote:
>
> On Mon, 09 Aug 2021, Mike Javorski wrote:
> > I have been experiencing nfs file access hangs with multiple release
> > versions of the 5.13.x linux kernel. In each case, all file transfers
> > freeze for 5-10 seconds and then resume. This seems worse when reading
> > through many files sequentially.
>
> A particularly useful debugging tool for NFS freezes is to run
>
>   rpcdebug -m rpc -c all
>
> while the system appears frozen.  As you only have a 5-10 second window
> this might be tricky.
> Setting or clearing debug flags in the rpc module (whether they are
> already set or not) has a side effect if listing all RPC "tasks" which a
> waiting for a reply.  Seeing that task list can often be useful.
>
> The task list appears in "dmesg" output.  If there are not tasks
> waiting, nothing will be written which might lead you to think it didn't
> work.
>
> As Chuck hinted, tcpdump is invaluable for this sort of problem.
>   tcpdump -s 0 -w /tmp/somefile.pcap port 2049
>
> will capture NFS traffic.  If this can start before a hang, and finish
> after, it may contain useful information.  Doing that in a way that
> doesn't create an enormous file might be a challenge.  It would help if
> you found a way trigger the problem.  Take note of the circumstances
> when it seems to happen the most.  If you can only produce a large file,
> we can probably still work with it.
>   tshark -r /tmp/somefile.pcap
> will report the capture one line per packet.  You can look for the
> appropriate timestamp, note the frame numbers, and use "editcap"
> to extract a suitable range of packets.
>
> NeilBrown
>
>
> >
> > My server:
> > - Archlinux w/ a distribution provided kernel package
> > - filesystems exported with "rw,sync,no_subtree_check,insecure" options
> >
> > Client:
> > - Archlinux w/ latest distribution provided kernel (5.13.9-arch1-1 at writing)
> > - nfs mounted via /net autofs with "soft,nodev,nosuid" options
> > (ver=4.2 is indicated in mount)
> >
> > I have tried the 5.13.x kernel several times since the first arch
> > release (most recently with 5.13.9-arch1-1), all with similar results.
> > Each time, I am forced to downgrade the linux package to a 5.12.x
> > kernel (5.12.15-arch1 as of writing) to clear up the transfer issues
> > and stabilize performance. No other changes are made between tests. I
> > have confirmed the freezing behavior using both ext4 and btrfs
> > filesystems exported from this server.
> >
> > At this point I would appreciate some guidance in what to provide in
> > order to diagnose and resolve this issue. I don't have a lot of kernel
> > debugging experience, so instruction would be helpful.
> >
> > - mike
> >
> >

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-09  0:28   ` Mike Javorski
@ 2021-08-10  0:50     ` Mike Javorski
  2021-08-10  1:28       ` NeilBrown
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Javorski @ 2021-08-10  0:50 UTC (permalink / raw)
  To: linux-nfs

Managed to get the rpcdebug info for some freezes today. I had a loop
running to call the rpcbind command that Neil provided and then sleep
for 2 seconds. Each triggering event was the opening of a new video
file (I was shuffling through files with mpv). I haven't managed to
get a tcp capture yet trying to get one that isn't too massive, but
the timing is tricky, and this doesn't happen every time.

-----------------------------

[31229.965368] -pid- flgs status -client- --rqstp- -timeout ---ops--
[31229.965370] 44224 c801      0 8c478d70 2473aa3b    15395  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending
[31229.965411] 44225 c801      0 8c478d70 a7d3b9b8    15395  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending
[31231.966982] -pid- flgs status -client- --rqstp- -timeout ---ops--
[31231.966986] 44224 c801      0 8c478d70 2473aa3b    14795  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending
[31231.967027] 44225 c801      0 8c478d70 a7d3b9b8    14795  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending
[31233.968741] -pid- flgs status -client- --rqstp- -timeout ---ops--
[31233.968745] 44224 c801      0 8c478d70 2473aa3b    14194  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending
[31233.968787] 44225 c801      0 8c478d70 a7d3b9b8    14194  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending

-----------------------------

[31292.534529] -pid- flgs status -client- --rqstp- -timeout ---ops--
[31292.534530] 45744 c801      0 8c478d70 2473aa3b    15340  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending
[31292.534554] 45745 c801      0 8c478d70 50cbef7b    15342  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending
[31294.536203] -pid- flgs status -client- --rqstp- -timeout ---ops--
[31294.536206] 45744 c801      0 8c478d70 2473aa3b    14740  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending
[31294.536232] 45745 c801      0 8c478d70 50cbef7b    14742  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending
[31296.537826] -pid- flgs status -client- --rqstp- -timeout ---ops--
[31296.537830] 45744 c801      0 8c478d70 2473aa3b    14139  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending
[31296.537871] 45745 c801      0 8c478d70 50cbef7b    14141  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending
[31298.539474] -pid- flgs status -client- --rqstp- -timeout ---ops--
[31298.539478] 45744 c801      0 8c478d70 2473aa3b    13539  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending
[31298.539522] 45745 c801      0 8c478d70 50cbef7b    13541  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending
[31300.541040] -pid- flgs status -client- --rqstp- -timeout ---ops--
[31300.541044] 46048 c801      0 8c478d70 50cbef7b    17965  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending
[31300.541108] 46049 c801      0 8c478d70 2473aa3b    17965  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending
[31302.542632] -pid- flgs status -client- --rqstp- -timeout ---ops--
[31302.542635] 46922 c801      0 8c478d70 50cbef7b    18000  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending

---------------------------

[31375.046020] -pid- flgs status -client- --rqstp- -timeout ---ops--
[31375.046022] 47336 c801      0 8c478d70 cebaa9a6    15964  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending
[31375.046048] 47337 c801      0 8c478d70 f7145d4d    15964  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending
[31377.047361] -pid- flgs status -client- --rqstp- -timeout ---ops--
[31377.047365] 47336 c801      0 8c478d70 cebaa9a6    15364  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending
[31377.047406] 47337 c801      0 8c478d70 f7145d4d    15364  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending
[31379.048801] -pid- flgs status -client- --rqstp- -timeout ---ops--
[31379.048805] 47336 c801      0 8c478d70 cebaa9a6    14764  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending
[31379.048845] 47337 c801      0 8c478d70 f7145d4d    14764  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending
[31381.050376] -pid- flgs status -client- --rqstp- -timeout ---ops--
[31381.050380] 47336 c801      0 8c478d70 cebaa9a6    14163  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending
[31381.050421] 47337 c801      0 8c478d70 f7145d4d    14163  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending
[31383.051746] -pid- flgs status -client- --rqstp- -timeout ---ops--
[31383.051749] 47336 c801      0 8c478d70 cebaa9a6    13563  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending
[31383.051775] 47337 c801      0 8c478d70 f7145d4d    13563  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending
[31385.053182] -pid- flgs status -client- --rqstp- -timeout ---ops--
[31385.053185] 47342 c801      0 8c478d70 f7145d4d    17980  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending
[31387.054422] -pid- flgs status -client- --rqstp- -timeout ---ops--
[31387.054426] 47818 c801      0 8c478d70 f7145d4d    17761  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending
[31387.054467] 47819 c801      0 8c478d70 cebaa9a6    17762  8531859
nfsv4 READ a:call_status [sunrpc] q:xprt_pending


- mike


On Sun, Aug 8, 2021 at 5:28 PM Mike Javorski <mike.javorski@gmail.com> wrote:
>
> Appreciate the insights Neil. I will make an attempt to
> trigger/capture the fault if I can.
>
> - mike
>
> On Sun, Aug 8, 2021 at 5:01 PM NeilBrown <neilb@suse.de> wrote:
> >
> > On Mon, 09 Aug 2021, Mike Javorski wrote:
> > > I have been experiencing nfs file access hangs with multiple release
> > > versions of the 5.13.x linux kernel. In each case, all file transfers
> > > freeze for 5-10 seconds and then resume. This seems worse when reading
> > > through many files sequentially.
> >
> > A particularly useful debugging tool for NFS freezes is to run
> >
> >   rpcdebug -m rpc -c all
> >
> > while the system appears frozen.  As you only have a 5-10 second window
> > this might be tricky.
> > Setting or clearing debug flags in the rpc module (whether they are
> > already set or not) has a side effect if listing all RPC "tasks" which a
> > waiting for a reply.  Seeing that task list can often be useful.
> >
> > The task list appears in "dmesg" output.  If there are not tasks
> > waiting, nothing will be written which might lead you to think it didn't
> > work.
> >
> > As Chuck hinted, tcpdump is invaluable for this sort of problem.
> >   tcpdump -s 0 -w /tmp/somefile.pcap port 2049
> >
> > will capture NFS traffic.  If this can start before a hang, and finish
> > after, it may contain useful information.  Doing that in a way that
> > doesn't create an enormous file might be a challenge.  It would help if
> > you found a way trigger the problem.  Take note of the circumstances
> > when it seems to happen the most.  If you can only produce a large file,
> > we can probably still work with it.
> >   tshark -r /tmp/somefile.pcap
> > will report the capture one line per packet.  You can look for the
> > appropriate timestamp, note the frame numbers, and use "editcap"
> > to extract a suitable range of packets.
> >
> > NeilBrown
> >
> >
> > >
> > > My server:
> > > - Archlinux w/ a distribution provided kernel package
> > > - filesystems exported with "rw,sync,no_subtree_check,insecure" options
> > >
> > > Client:
> > > - Archlinux w/ latest distribution provided kernel (5.13.9-arch1-1 at writing)
> > > - nfs mounted via /net autofs with "soft,nodev,nosuid" options
> > > (ver=4.2 is indicated in mount)
> > >
> > > I have tried the 5.13.x kernel several times since the first arch
> > > release (most recently with 5.13.9-arch1-1), all with similar results.
> > > Each time, I am forced to downgrade the linux package to a 5.12.x
> > > kernel (5.12.15-arch1 as of writing) to clear up the transfer issues
> > > and stabilize performance. No other changes are made between tests. I
> > > have confirmed the freezing behavior using both ext4 and btrfs
> > > filesystems exported from this server.
> > >
> > > At this point I would appreciate some guidance in what to provide in
> > > order to diagnose and resolve this issue. I don't have a lot of kernel
> > > debugging experience, so instruction would be helpful.
> > >
> > > - mike
> > >
> > >

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-10  0:50     ` Mike Javorski
@ 2021-08-10  1:28       ` NeilBrown
  2021-08-10 11:54         ` Daire Byrne
  2021-08-13  1:51         ` Mike Javorski
  0 siblings, 2 replies; 47+ messages in thread
From: NeilBrown @ 2021-08-10  1:28 UTC (permalink / raw)
  To: Mike Javorski; +Cc: linux-nfs

On Tue, 10 Aug 2021, Mike Javorski wrote:
> Managed to get the rpcdebug info for some freezes today. I had a loop
> running to call the rpcbind command that Neil provided and then sleep
> for 2 seconds. Each triggering event was the opening of a new video
> file (I was shuffling through files with mpv). I haven't managed to
> get a tcp capture yet trying to get one that isn't too massive, but
> the timing is tricky, and this doesn't happen every time.
> 

Thanks for the logs.

They show consistent "a:call_status" and "q:xprt_pending".  They also
show a timeout in the range 10000-20000.  This number is in 'jiffies',
which is normally millisecs, so 10 to 20 seconds.

That means the RPC layer (thinks it has) sent a request (made a call)
and is waiting for a reply to arrive over the network transport (i.e.
the TCP connection).  However that would typically involve a much longer
timeout.  For NFSv3 and 4.0, 60 seconds is the norm.  For NFSv4.1 I
*think* there is no timeout as NFS never resends requests unless the TCP
connection is broken.  So I don't know what to make of that.

Given the fact that it recovers in 5-10 seconds, it seems very likely
that it actually has sent the request (presumably the timeout doesn't
expire).  Mostly likely the server is being delayed in replying.  A
tcpdump trace would confirm this.

I don't think you've said anything about what NFS server you are using. 
It is Linux, or something else?  If Linux, what kernel do you run there?

One thing that might cause a delay when accessing a file is if some
other client has a 'delegation' and takes a while to return it.
e.g.
  client2 creates a file for writing
  server gives it a 'write delegation'
  client2 queues up lots of write request
  client1 asks to read the file.
  server says to client2 "please return the delegation"
  client2 starts flushing its data, but either takes too long or 
   has some other hiccup. Eventually it is returned, or the server
   revokes it
  server tells clients one it can read now.

This is leading to me asking: do you have other NFS clients that might
be accessing the same file, or might have accessed it recently?

NeilBrown


> -----------------------------
> 
> [31229.965368] -pid- flgs status -client- --rqstp- -timeout ---ops--
> [31229.965370] 44224 c801      0 8c478d70 2473aa3b    15395  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> [31229.965411] 44225 c801      0 8c478d70 a7d3b9b8    15395  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> [31231.966982] -pid- flgs status -client- --rqstp- -timeout ---ops--
> [31231.966986] 44224 c801      0 8c478d70 2473aa3b    14795  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> [31231.967027] 44225 c801      0 8c478d70 a7d3b9b8    14795  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> [31233.968741] -pid- flgs status -client- --rqstp- -timeout ---ops--
> [31233.968745] 44224 c801      0 8c478d70 2473aa3b    14194  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> [31233.968787] 44225 c801      0 8c478d70 a7d3b9b8    14194  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> 
> -----------------------------
> 
> [31292.534529] -pid- flgs status -client- --rqstp- -timeout ---ops--
> [31292.534530] 45744 c801      0 8c478d70 2473aa3b    15340  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> [31292.534554] 45745 c801      0 8c478d70 50cbef7b    15342  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> [31294.536203] -pid- flgs status -client- --rqstp- -timeout ---ops--
> [31294.536206] 45744 c801      0 8c478d70 2473aa3b    14740  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> [31294.536232] 45745 c801      0 8c478d70 50cbef7b    14742  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> [31296.537826] -pid- flgs status -client- --rqstp- -timeout ---ops--
> [31296.537830] 45744 c801      0 8c478d70 2473aa3b    14139  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> [31296.537871] 45745 c801      0 8c478d70 50cbef7b    14141  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> [31298.539474] -pid- flgs status -client- --rqstp- -timeout ---ops--
> [31298.539478] 45744 c801      0 8c478d70 2473aa3b    13539  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> [31298.539522] 45745 c801      0 8c478d70 50cbef7b    13541  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> [31300.541040] -pid- flgs status -client- --rqstp- -timeout ---ops--
> [31300.541044] 46048 c801      0 8c478d70 50cbef7b    17965  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> [31300.541108] 46049 c801      0 8c478d70 2473aa3b    17965  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> [31302.542632] -pid- flgs status -client- --rqstp- -timeout ---ops--
> [31302.542635] 46922 c801      0 8c478d70 50cbef7b    18000  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> 
> ---------------------------
> 
> [31375.046020] -pid- flgs status -client- --rqstp- -timeout ---ops--
> [31375.046022] 47336 c801      0 8c478d70 cebaa9a6    15964  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> [31375.046048] 47337 c801      0 8c478d70 f7145d4d    15964  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> [31377.047361] -pid- flgs status -client- --rqstp- -timeout ---ops--
> [31377.047365] 47336 c801      0 8c478d70 cebaa9a6    15364  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> [31377.047406] 47337 c801      0 8c478d70 f7145d4d    15364  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> [31379.048801] -pid- flgs status -client- --rqstp- -timeout ---ops--
> [31379.048805] 47336 c801      0 8c478d70 cebaa9a6    14764  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> [31379.048845] 47337 c801      0 8c478d70 f7145d4d    14764  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> [31381.050376] -pid- flgs status -client- --rqstp- -timeout ---ops--
> [31381.050380] 47336 c801      0 8c478d70 cebaa9a6    14163  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> [31381.050421] 47337 c801      0 8c478d70 f7145d4d    14163  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> [31383.051746] -pid- flgs status -client- --rqstp- -timeout ---ops--
> [31383.051749] 47336 c801      0 8c478d70 cebaa9a6    13563  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> [31383.051775] 47337 c801      0 8c478d70 f7145d4d    13563  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> [31385.053182] -pid- flgs status -client- --rqstp- -timeout ---ops--
> [31385.053185] 47342 c801      0 8c478d70 f7145d4d    17980  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> [31387.054422] -pid- flgs status -client- --rqstp- -timeout ---ops--
> [31387.054426] 47818 c801      0 8c478d70 f7145d4d    17761  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> [31387.054467] 47819 c801      0 8c478d70 cebaa9a6    17762  8531859
> nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> 
> 
> - mike
> 
> 
> On Sun, Aug 8, 2021 at 5:28 PM Mike Javorski <mike.javorski@gmail.com> wrote:
> >
> > Appreciate the insights Neil. I will make an attempt to
> > trigger/capture the fault if I can.
> >
> > - mike
> >
> > On Sun, Aug 8, 2021 at 5:01 PM NeilBrown <neilb@suse.de> wrote:
> > >
> > > On Mon, 09 Aug 2021, Mike Javorski wrote:
> > > > I have been experiencing nfs file access hangs with multiple release
> > > > versions of the 5.13.x linux kernel. In each case, all file transfers
> > > > freeze for 5-10 seconds and then resume. This seems worse when reading
> > > > through many files sequentially.
> > >
> > > A particularly useful debugging tool for NFS freezes is to run
> > >
> > >   rpcdebug -m rpc -c all
> > >
> > > while the system appears frozen.  As you only have a 5-10 second window
> > > this might be tricky.
> > > Setting or clearing debug flags in the rpc module (whether they are
> > > already set or not) has a side effect if listing all RPC "tasks" which a
> > > waiting for a reply.  Seeing that task list can often be useful.
> > >
> > > The task list appears in "dmesg" output.  If there are not tasks
> > > waiting, nothing will be written which might lead you to think it didn't
> > > work.
> > >
> > > As Chuck hinted, tcpdump is invaluable for this sort of problem.
> > >   tcpdump -s 0 -w /tmp/somefile.pcap port 2049
> > >
> > > will capture NFS traffic.  If this can start before a hang, and finish
> > > after, it may contain useful information.  Doing that in a way that
> > > doesn't create an enormous file might be a challenge.  It would help if
> > > you found a way trigger the problem.  Take note of the circumstances
> > > when it seems to happen the most.  If you can only produce a large file,
> > > we can probably still work with it.
> > >   tshark -r /tmp/somefile.pcap
> > > will report the capture one line per packet.  You can look for the
> > > appropriate timestamp, note the frame numbers, and use "editcap"
> > > to extract a suitable range of packets.
> > >
> > > NeilBrown
> > >
> > >
> > > >
> > > > My server:
> > > > - Archlinux w/ a distribution provided kernel package
> > > > - filesystems exported with "rw,sync,no_subtree_check,insecure" options
> > > >
> > > > Client:
> > > > - Archlinux w/ latest distribution provided kernel (5.13.9-arch1-1 at writing)
> > > > - nfs mounted via /net autofs with "soft,nodev,nosuid" options
> > > > (ver=4.2 is indicated in mount)
> > > >
> > > > I have tried the 5.13.x kernel several times since the first arch
> > > > release (most recently with 5.13.9-arch1-1), all with similar results.
> > > > Each time, I am forced to downgrade the linux package to a 5.12.x
> > > > kernel (5.12.15-arch1 as of writing) to clear up the transfer issues
> > > > and stabilize performance. No other changes are made between tests. I
> > > > have confirmed the freezing behavior using both ext4 and btrfs
> > > > filesystems exported from this server.
> > > >
> > > > At this point I would appreciate some guidance in what to provide in
> > > > order to diagnose and resolve this issue. I don't have a lot of kernel
> > > > debugging experience, so instruction would be helpful.
> > > >
> > > > - mike
> > > >
> > > >
> 
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-10  1:28       ` NeilBrown
@ 2021-08-10 11:54         ` Daire Byrne
  2021-08-13  1:51         ` Mike Javorski
  1 sibling, 0 replies; 47+ messages in thread
From: Daire Byrne @ 2021-08-10 11:54 UTC (permalink / raw)
  To: NeilBrown; +Cc: Mike Javorski, linux-nfs

This sounds a bit like my bugzilla ticket:
https://bugzilla.kernel.org/show_bug.cgi?id=213887

But in my case the client rarely recovers from the timeout so the hang
becomes permanent. My setup is a bit more complicated though (NFS
re-exporting).

Daire

On Tue, 10 Aug 2021 at 02:29, NeilBrown <neilb@suse.de> wrote:
>
> On Tue, 10 Aug 2021, Mike Javorski wrote:
> > Managed to get the rpcdebug info for some freezes today. I had a loop
> > running to call the rpcbind command that Neil provided and then sleep
> > for 2 seconds. Each triggering event was the opening of a new video
> > file (I was shuffling through files with mpv). I haven't managed to
> > get a tcp capture yet trying to get one that isn't too massive, but
> > the timing is tricky, and this doesn't happen every time.
> >
>
> Thanks for the logs.
>
> They show consistent "a:call_status" and "q:xprt_pending".  They also
> show a timeout in the range 10000-20000.  This number is in 'jiffies',
> which is normally millisecs, so 10 to 20 seconds.
>
> That means the RPC layer (thinks it has) sent a request (made a call)
> and is waiting for a reply to arrive over the network transport (i.e.
> the TCP connection).  However that would typically involve a much longer
> timeout.  For NFSv3 and 4.0, 60 seconds is the norm.  For NFSv4.1 I
> *think* there is no timeout as NFS never resends requests unless the TCP
> connection is broken.  So I don't know what to make of that.
>
> Given the fact that it recovers in 5-10 seconds, it seems very likely
> that it actually has sent the request (presumably the timeout doesn't
> expire).  Mostly likely the server is being delayed in replying.  A
> tcpdump trace would confirm this.
>
> I don't think you've said anything about what NFS server you are using.
> It is Linux, or something else?  If Linux, what kernel do you run there?
>
> One thing that might cause a delay when accessing a file is if some
> other client has a 'delegation' and takes a while to return it.
> e.g.
>   client2 creates a file for writing
>   server gives it a 'write delegation'
>   client2 queues up lots of write request
>   client1 asks to read the file.
>   server says to client2 "please return the delegation"
>   client2 starts flushing its data, but either takes too long or
>    has some other hiccup. Eventually it is returned, or the server
>    revokes it
>   server tells clients one it can read now.
>
> This is leading to me asking: do you have other NFS clients that might
> be accessing the same file, or might have accessed it recently?
>
> NeilBrown
>
>
> > -----------------------------
> >
> > [31229.965368] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31229.965370] 44224 c801      0 8c478d70 2473aa3b    15395  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31229.965411] 44225 c801      0 8c478d70 a7d3b9b8    15395  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31231.966982] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31231.966986] 44224 c801      0 8c478d70 2473aa3b    14795  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31231.967027] 44225 c801      0 8c478d70 a7d3b9b8    14795  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31233.968741] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31233.968745] 44224 c801      0 8c478d70 2473aa3b    14194  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31233.968787] 44225 c801      0 8c478d70 a7d3b9b8    14194  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> >
> > -----------------------------
> >
> > [31292.534529] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31292.534530] 45744 c801      0 8c478d70 2473aa3b    15340  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31292.534554] 45745 c801      0 8c478d70 50cbef7b    15342  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31294.536203] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31294.536206] 45744 c801      0 8c478d70 2473aa3b    14740  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31294.536232] 45745 c801      0 8c478d70 50cbef7b    14742  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31296.537826] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31296.537830] 45744 c801      0 8c478d70 2473aa3b    14139  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31296.537871] 45745 c801      0 8c478d70 50cbef7b    14141  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31298.539474] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31298.539478] 45744 c801      0 8c478d70 2473aa3b    13539  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31298.539522] 45745 c801      0 8c478d70 50cbef7b    13541  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31300.541040] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31300.541044] 46048 c801      0 8c478d70 50cbef7b    17965  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31300.541108] 46049 c801      0 8c478d70 2473aa3b    17965  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31302.542632] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31302.542635] 46922 c801      0 8c478d70 50cbef7b    18000  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> >
> > ---------------------------
> >
> > [31375.046020] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31375.046022] 47336 c801      0 8c478d70 cebaa9a6    15964  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31375.046048] 47337 c801      0 8c478d70 f7145d4d    15964  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31377.047361] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31377.047365] 47336 c801      0 8c478d70 cebaa9a6    15364  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31377.047406] 47337 c801      0 8c478d70 f7145d4d    15364  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31379.048801] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31379.048805] 47336 c801      0 8c478d70 cebaa9a6    14764  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31379.048845] 47337 c801      0 8c478d70 f7145d4d    14764  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31381.050376] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31381.050380] 47336 c801      0 8c478d70 cebaa9a6    14163  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31381.050421] 47337 c801      0 8c478d70 f7145d4d    14163  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31383.051746] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31383.051749] 47336 c801      0 8c478d70 cebaa9a6    13563  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31383.051775] 47337 c801      0 8c478d70 f7145d4d    13563  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31385.053182] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31385.053185] 47342 c801      0 8c478d70 f7145d4d    17980  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31387.054422] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31387.054426] 47818 c801      0 8c478d70 f7145d4d    17761  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31387.054467] 47819 c801      0 8c478d70 cebaa9a6    17762  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> >
> >
> > - mike
> >
> >
> > On Sun, Aug 8, 2021 at 5:28 PM Mike Javorski <mike.javorski@gmail.com> wrote:
> > >
> > > Appreciate the insights Neil. I will make an attempt to
> > > trigger/capture the fault if I can.
> > >
> > > - mike
> > >
> > > On Sun, Aug 8, 2021 at 5:01 PM NeilBrown <neilb@suse.de> wrote:
> > > >
> > > > On Mon, 09 Aug 2021, Mike Javorski wrote:
> > > > > I have been experiencing nfs file access hangs with multiple release
> > > > > versions of the 5.13.x linux kernel. In each case, all file transfers
> > > > > freeze for 5-10 seconds and then resume. This seems worse when reading
> > > > > through many files sequentially.
> > > >
> > > > A particularly useful debugging tool for NFS freezes is to run
> > > >
> > > >   rpcdebug -m rpc -c all
> > > >
> > > > while the system appears frozen.  As you only have a 5-10 second window
> > > > this might be tricky.
> > > > Setting or clearing debug flags in the rpc module (whether they are
> > > > already set or not) has a side effect if listing all RPC "tasks" which a
> > > > waiting for a reply.  Seeing that task list can often be useful.
> > > >
> > > > The task list appears in "dmesg" output.  If there are not tasks
> > > > waiting, nothing will be written which might lead you to think it didn't
> > > > work.
> > > >
> > > > As Chuck hinted, tcpdump is invaluable for this sort of problem.
> > > >   tcpdump -s 0 -w /tmp/somefile.pcap port 2049
> > > >
> > > > will capture NFS traffic.  If this can start before a hang, and finish
> > > > after, it may contain useful information.  Doing that in a way that
> > > > doesn't create an enormous file might be a challenge.  It would help if
> > > > you found a way trigger the problem.  Take note of the circumstances
> > > > when it seems to happen the most.  If you can only produce a large file,
> > > > we can probably still work with it.
> > > >   tshark -r /tmp/somefile.pcap
> > > > will report the capture one line per packet.  You can look for the
> > > > appropriate timestamp, note the frame numbers, and use "editcap"
> > > > to extract a suitable range of packets.
> > > >
> > > > NeilBrown
> > > >
> > > >
> > > > >
> > > > > My server:
> > > > > - Archlinux w/ a distribution provided kernel package
> > > > > - filesystems exported with "rw,sync,no_subtree_check,insecure" options
> > > > >
> > > > > Client:
> > > > > - Archlinux w/ latest distribution provided kernel (5.13.9-arch1-1 at writing)
> > > > > - nfs mounted via /net autofs with "soft,nodev,nosuid" options
> > > > > (ver=4.2 is indicated in mount)
> > > > >
> > > > > I have tried the 5.13.x kernel several times since the first arch
> > > > > release (most recently with 5.13.9-arch1-1), all with similar results.
> > > > > Each time, I am forced to downgrade the linux package to a 5.12.x
> > > > > kernel (5.12.15-arch1 as of writing) to clear up the transfer issues
> > > > > and stabilize performance. No other changes are made between tests. I
> > > > > have confirmed the freezing behavior using both ext4 and btrfs
> > > > > filesystems exported from this server.
> > > > >
> > > > > At this point I would appreciate some guidance in what to provide in
> > > > > order to diagnose and resolve this issue. I don't have a lot of kernel
> > > > > debugging experience, so instruction would be helpful.
> > > > >
> > > > > - mike
> > > > >
> > > > >
> >
> >

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-10  1:28       ` NeilBrown
  2021-08-10 11:54         ` Daire Byrne
@ 2021-08-13  1:51         ` Mike Javorski
  2021-08-13  2:39           ` NeilBrown
  1 sibling, 1 reply; 47+ messages in thread
From: Mike Javorski @ 2021-08-13  1:51 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-nfs

Neil:

Apologies for the delay, your message didn't get properly flagged in my email.

To answer your questions, both client (my Desktop PC) and server (my
NAS) are running ArchLinux; client w/ current kernel (5.13.9), server
w/ current or alternate testing kernels (see below).
The server is running the kernel nfs server
The client is using the built-in nfs client, and the "mount" output
indicates it is using nfs with "ver=4.2"
Both machines are plugged into the same network switch.
At time of testing, there is only one client active.
The files in this case are static media files (some years old), and
there are no co-occurring writes to the files being read (or even to
the same directory)

It was suggested on my archlinux bug report that I try a kernel just
before the 5.13 NFS changes landed, and right after to see if they
trigger the behavior, as well as trying a vanilla 5.14 rc kernel. I
still need to test more, but I was able to get the freezes with
5.14-rc4, but so far inconclusive for the earlier 5.13 kernels. I
intend to spend some time this weekend attempting to get the tcpdump.
My initial attempts wound up with 400+Mb files which would be
difficult to ship and use for diagnostics.

- mike


On Mon, Aug 9, 2021 at 6:28 PM NeilBrown <neilb@suse.de> wrote:
>
> On Tue, 10 Aug 2021, Mike Javorski wrote:
> > Managed to get the rpcdebug info for some freezes today. I had a loop
> > running to call the rpcbind command that Neil provided and then sleep
> > for 2 seconds. Each triggering event was the opening of a new video
> > file (I was shuffling through files with mpv). I haven't managed to
> > get a tcp capture yet trying to get one that isn't too massive, but
> > the timing is tricky, and this doesn't happen every time.
> >
>
> Thanks for the logs.
>
> They show consistent "a:call_status" and "q:xprt_pending".  They also
> show a timeout in the range 10000-20000.  This number is in 'jiffies',
> which is normally millisecs, so 10 to 20 seconds.
>
> That means the RPC layer (thinks it has) sent a request (made a call)
> and is waiting for a reply to arrive over the network transport (i.e.
> the TCP connection).  However that would typically involve a much longer
> timeout.  For NFSv3 and 4.0, 60 seconds is the norm.  For NFSv4.1 I
> *think* there is no timeout as NFS never resends requests unless the TCP
> connection is broken.  So I don't know what to make of that.
>
> Given the fact that it recovers in 5-10 seconds, it seems very likely
> that it actually has sent the request (presumably the timeout doesn't
> expire).  Mostly likely the server is being delayed in replying.  A
> tcpdump trace would confirm this.
>
> I don't think you've said anything about what NFS server you are using.
> It is Linux, or something else?  If Linux, what kernel do you run there?
>
> One thing that might cause a delay when accessing a file is if some
> other client has a 'delegation' and takes a while to return it.
> e.g.
>   client2 creates a file for writing
>   server gives it a 'write delegation'
>   client2 queues up lots of write request
>   client1 asks to read the file.
>   server says to client2 "please return the delegation"
>   client2 starts flushing its data, but either takes too long or
>    has some other hiccup. Eventually it is returned, or the server
>    revokes it
>   server tells clients one it can read now.
>
> This is leading to me asking: do you have other NFS clients that might
> be accessing the same file, or might have accessed it recently?
>
> NeilBrown
>
>
> > -----------------------------
> >
> > [31229.965368] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31229.965370] 44224 c801      0 8c478d70 2473aa3b    15395  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31229.965411] 44225 c801      0 8c478d70 a7d3b9b8    15395  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31231.966982] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31231.966986] 44224 c801      0 8c478d70 2473aa3b    14795  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31231.967027] 44225 c801      0 8c478d70 a7d3b9b8    14795  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31233.968741] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31233.968745] 44224 c801      0 8c478d70 2473aa3b    14194  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31233.968787] 44225 c801      0 8c478d70 a7d3b9b8    14194  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> >
> > -----------------------------
> >
> > [31292.534529] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31292.534530] 45744 c801      0 8c478d70 2473aa3b    15340  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31292.534554] 45745 c801      0 8c478d70 50cbef7b    15342  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31294.536203] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31294.536206] 45744 c801      0 8c478d70 2473aa3b    14740  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31294.536232] 45745 c801      0 8c478d70 50cbef7b    14742  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31296.537826] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31296.537830] 45744 c801      0 8c478d70 2473aa3b    14139  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31296.537871] 45745 c801      0 8c478d70 50cbef7b    14141  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31298.539474] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31298.539478] 45744 c801      0 8c478d70 2473aa3b    13539  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31298.539522] 45745 c801      0 8c478d70 50cbef7b    13541  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31300.541040] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31300.541044] 46048 c801      0 8c478d70 50cbef7b    17965  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31300.541108] 46049 c801      0 8c478d70 2473aa3b    17965  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31302.542632] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31302.542635] 46922 c801      0 8c478d70 50cbef7b    18000  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> >
> > ---------------------------
> >
> > [31375.046020] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31375.046022] 47336 c801      0 8c478d70 cebaa9a6    15964  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31375.046048] 47337 c801      0 8c478d70 f7145d4d    15964  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31377.047361] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31377.047365] 47336 c801      0 8c478d70 cebaa9a6    15364  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31377.047406] 47337 c801      0 8c478d70 f7145d4d    15364  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31379.048801] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31379.048805] 47336 c801      0 8c478d70 cebaa9a6    14764  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31379.048845] 47337 c801      0 8c478d70 f7145d4d    14764  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31381.050376] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31381.050380] 47336 c801      0 8c478d70 cebaa9a6    14163  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31381.050421] 47337 c801      0 8c478d70 f7145d4d    14163  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31383.051746] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31383.051749] 47336 c801      0 8c478d70 cebaa9a6    13563  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31383.051775] 47337 c801      0 8c478d70 f7145d4d    13563  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31385.053182] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31385.053185] 47342 c801      0 8c478d70 f7145d4d    17980  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31387.054422] -pid- flgs status -client- --rqstp- -timeout ---ops--
> > [31387.054426] 47818 c801      0 8c478d70 f7145d4d    17761  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> > [31387.054467] 47819 c801      0 8c478d70 cebaa9a6    17762  8531859
> > nfsv4 READ a:call_status [sunrpc] q:xprt_pending
> >
> >
> > - mike
> >
> >
> > On Sun, Aug 8, 2021 at 5:28 PM Mike Javorski <mike.javorski@gmail.com> wrote:
> > >
> > > Appreciate the insights Neil. I will make an attempt to
> > > trigger/capture the fault if I can.
> > >
> > > - mike
> > >
> > > On Sun, Aug 8, 2021 at 5:01 PM NeilBrown <neilb@suse.de> wrote:
> > > >
> > > > On Mon, 09 Aug 2021, Mike Javorski wrote:
> > > > > I have been experiencing nfs file access hangs with multiple release
> > > > > versions of the 5.13.x linux kernel. In each case, all file transfers
> > > > > freeze for 5-10 seconds and then resume. This seems worse when reading
> > > > > through many files sequentially.
> > > >
> > > > A particularly useful debugging tool for NFS freezes is to run
> > > >
> > > >   rpcdebug -m rpc -c all
> > > >
> > > > while the system appears frozen.  As you only have a 5-10 second window
> > > > this might be tricky.
> > > > Setting or clearing debug flags in the rpc module (whether they are
> > > > already set or not) has a side effect if listing all RPC "tasks" which a
> > > > waiting for a reply.  Seeing that task list can often be useful.
> > > >
> > > > The task list appears in "dmesg" output.  If there are not tasks
> > > > waiting, nothing will be written which might lead you to think it didn't
> > > > work.
> > > >
> > > > As Chuck hinted, tcpdump is invaluable for this sort of problem.
> > > >   tcpdump -s 0 -w /tmp/somefile.pcap port 2049
> > > >
> > > > will capture NFS traffic.  If this can start before a hang, and finish
> > > > after, it may contain useful information.  Doing that in a way that
> > > > doesn't create an enormous file might be a challenge.  It would help if
> > > > you found a way trigger the problem.  Take note of the circumstances
> > > > when it seems to happen the most.  If you can only produce a large file,
> > > > we can probably still work with it.
> > > >   tshark -r /tmp/somefile.pcap
> > > > will report the capture one line per packet.  You can look for the
> > > > appropriate timestamp, note the frame numbers, and use "editcap"
> > > > to extract a suitable range of packets.
> > > >
> > > > NeilBrown
> > > >
> > > >
> > > > >
> > > > > My server:
> > > > > - Archlinux w/ a distribution provided kernel package
> > > > > - filesystems exported with "rw,sync,no_subtree_check,insecure" options
> > > > >
> > > > > Client:
> > > > > - Archlinux w/ latest distribution provided kernel (5.13.9-arch1-1 at writing)
> > > > > - nfs mounted via /net autofs with "soft,nodev,nosuid" options
> > > > > (ver=4.2 is indicated in mount)
> > > > >
> > > > > I have tried the 5.13.x kernel several times since the first arch
> > > > > release (most recently with 5.13.9-arch1-1), all with similar results.
> > > > > Each time, I am forced to downgrade the linux package to a 5.12.x
> > > > > kernel (5.12.15-arch1 as of writing) to clear up the transfer issues
> > > > > and stabilize performance. No other changes are made between tests. I
> > > > > have confirmed the freezing behavior using both ext4 and btrfs
> > > > > filesystems exported from this server.
> > > > >
> > > > > At this point I would appreciate some guidance in what to provide in
> > > > > order to diagnose and resolve this issue. I don't have a lot of kernel
> > > > > debugging experience, so instruction would be helpful.
> > > > >
> > > > > - mike
> > > > >
> > > > >
> >
> >

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-13  1:51         ` Mike Javorski
@ 2021-08-13  2:39           ` NeilBrown
  2021-08-13  2:53             ` Mike Javorski
  0 siblings, 1 reply; 47+ messages in thread
From: NeilBrown @ 2021-08-13  2:39 UTC (permalink / raw)
  To: Mike Javorski; +Cc: linux-nfs

On Fri, 13 Aug 2021, Mike Javorski wrote:
> Neil:
> 
> Apologies for the delay, your message didn't get properly flagged in my email.

:-)

> 
> To answer your questions, both client (my Desktop PC) and server (my
> NAS) are running ArchLinux; client w/ current kernel (5.13.9), server
> w/ current or alternate testing kernels (see below).

So the bug could be in the server or the client.  I assume you are
careful to test a client against a know-good server, or a server against
a known-good client.

>                                                                 I
> intend to spend some time this weekend attempting to get the tcpdump.
> My initial attempts wound up with 400+Mb files which would be
> difficult to ship and use for diagnostics.

Rather than you sending me the dump, I'll send you the code.

Run
  tshark -r filename -d tcp.port==2049,rpc -Y 'tcp.port==2049 && rpc.time > 1'

This will ensure the NFS traffic is actually decoded as NFS and then
report only NFS(rpc) replies that arrive more than 1 second after the
request.
You can add

    -T fields -e frame.number -e rpc.time

to find out what the actual delay was.

If it reports any, that will be interesting.  Try with a larger time if
necessary to get a modest number of hits.  Using editcap and the given
frame number you can select out 1000 packets either side of the problem
and that should compress to be small enough to transport.

However it might not find anything.  If the reply never arrives, you'll
never get a reply with a long timeout.  So we need to check that
everything got a reply...

 tshark -r filename -t tcp.port==2049,rpc  \
   -Y 'tcp.port==2049 && rpc.msg == 0' -T fields \
   -e rpc.xid -e frame.number | sort > /tmp/requests

 tshark -r filename -t tcp.port==2049,rpc  \
   -Y 'tcp.port==2049 && rpc.msg == 1' -T fields \
   -e rpc.xid -e frame.number | sort > /tmp/replies

 join -a1 /tmp/requests /tmp/replies | awk 'NF==2'

This should list the xid and frame number of all requests that didn't
get a reply.  Again, editcap can extract a range of frames into a file of
manageable size.

Another possibility is that requests are getting replies, but the reply
says "NFS4ERR_DELAY"

 tshark -r filename -t tcp.port==2049,rpc -Y nfs.nfsstat4==10008

should report any reply with that error code.

Hopefully something there will be interesting.

NeilBrown

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-13  2:39           ` NeilBrown
@ 2021-08-13  2:53             ` Mike Javorski
  2021-08-15  1:23               ` Mike Javorski
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Javorski @ 2021-08-13  2:53 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-nfs

The "semi-known-good" has been the client. I tried updating the server
multiple times to a 5.13 kernel and each time had to downgrade to the
last 5.12 kernel that ArchLinux released (5.12.15) to stabilize
performance. At each attempt, the client was running the same 5.13
kernel that was being deployed to the server. I never downgraded the
client.

Thank you for the scripts and all the details, I will test things out
this weekend when I can dedicate time to it.

- mike

On Thu, Aug 12, 2021 at 7:39 PM NeilBrown <neilb@suse.de> wrote:
>
> On Fri, 13 Aug 2021, Mike Javorski wrote:
> > Neil:
> >
> > Apologies for the delay, your message didn't get properly flagged in my email.
>
> :-)
>
> >
> > To answer your questions, both client (my Desktop PC) and server (my
> > NAS) are running ArchLinux; client w/ current kernel (5.13.9), server
> > w/ current or alternate testing kernels (see below).
>
> So the bug could be in the server or the client.  I assume you are
> careful to test a client against a know-good server, or a server against
> a known-good client.
>
> >                                                                 I
> > intend to spend some time this weekend attempting to get the tcpdump.
> > My initial attempts wound up with 400+Mb files which would be
> > difficult to ship and use for diagnostics.
>
> Rather than you sending me the dump, I'll send you the code.
>
> Run
>   tshark -r filename -d tcp.port==2049,rpc -Y 'tcp.port==2049 && rpc.time > 1'
>
> This will ensure the NFS traffic is actually decoded as NFS and then
> report only NFS(rpc) replies that arrive more than 1 second after the
> request.
> You can add
>
>     -T fields -e frame.number -e rpc.time
>
> to find out what the actual delay was.
>
> If it reports any, that will be interesting.  Try with a larger time if
> necessary to get a modest number of hits.  Using editcap and the given
> frame number you can select out 1000 packets either side of the problem
> and that should compress to be small enough to transport.
>
> However it might not find anything.  If the reply never arrives, you'll
> never get a reply with a long timeout.  So we need to check that
> everything got a reply...
>
>  tshark -r filename -t tcp.port==2049,rpc  \
>    -Y 'tcp.port==2049 && rpc.msg == 0' -T fields \
>    -e rpc.xid -e frame.number | sort > /tmp/requests
>
>  tshark -r filename -t tcp.port==2049,rpc  \
>    -Y 'tcp.port==2049 && rpc.msg == 1' -T fields \
>    -e rpc.xid -e frame.number | sort > /tmp/replies
>
>  join -a1 /tmp/requests /tmp/replies | awk 'NF==2'
>
> This should list the xid and frame number of all requests that didn't
> get a reply.  Again, editcap can extract a range of frames into a file of
> manageable size.
>
> Another possibility is that requests are getting replies, but the reply
> says "NFS4ERR_DELAY"
>
>  tshark -r filename -t tcp.port==2049,rpc -Y nfs.nfsstat4==10008
>
> should report any reply with that error code.
>
> Hopefully something there will be interesting.
>
> NeilBrown

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-13  2:53             ` Mike Javorski
@ 2021-08-15  1:23               ` Mike Javorski
  2021-08-16  1:20                 ` NeilBrown
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Javorski @ 2021-08-15  1:23 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-nfs

I managed to get a cap with several discreet freezes in it, and I
included a chunk with 5 of them in the span of ~3000 frames. I added
packet comments at each frame that the tshark command reported as > 1
sec RPC wait. Just search for "Freeze" in (wire|t)shark in packet
details. This is with kernel 5.13.10 provided by Arch (See
https://github.com/archlinux/linux/compare/a37da2be8e6c85...v5.13.10-arch1
for diff vs mainline, nothing NFS/RPC related I can identify).

I tried unsuccessfully to get any failures with the 5.12.15 kernel.

https://drive.google.com/file/d/1T42iX9xCdF9Oe4f7JXsnWqD8oJPrpMqV/view?usp=sharing

File should be downloadable anonymously.

- mike

On Thu, Aug 12, 2021 at 7:53 PM Mike Javorski <mike.javorski@gmail.com> wrote:
>
> The "semi-known-good" has been the client. I tried updating the server
> multiple times to a 5.13 kernel and each time had to downgrade to the
> last 5.12 kernel that ArchLinux released (5.12.15) to stabilize
> performance. At each attempt, the client was running the same 5.13
> kernel that was being deployed to the server. I never downgraded the
> client.
>
> Thank you for the scripts and all the details, I will test things out
> this weekend when I can dedicate time to it.
>
> - mike
>
> On Thu, Aug 12, 2021 at 7:39 PM NeilBrown <neilb@suse.de> wrote:
> >
> > On Fri, 13 Aug 2021, Mike Javorski wrote:
> > > Neil:
> > >
> > > Apologies for the delay, your message didn't get properly flagged in my email.
> >
> > :-)
> >
> > >
> > > To answer your questions, both client (my Desktop PC) and server (my
> > > NAS) are running ArchLinux; client w/ current kernel (5.13.9), server
> > > w/ current or alternate testing kernels (see below).
> >
> > So the bug could be in the server or the client.  I assume you are
> > careful to test a client against a know-good server, or a server against
> > a known-good client.
> >
> > >                                                                 I
> > > intend to spend some time this weekend attempting to get the tcpdump.
> > > My initial attempts wound up with 400+Mb files which would be
> > > difficult to ship and use for diagnostics.
> >
> > Rather than you sending me the dump, I'll send you the code.
> >
> > Run
> >   tshark -r filename -d tcp.port==2049,rpc -Y 'tcp.port==2049 && rpc.time > 1'
> >
> > This will ensure the NFS traffic is actually decoded as NFS and then
> > report only NFS(rpc) replies that arrive more than 1 second after the
> > request.
> > You can add
> >
> >     -T fields -e frame.number -e rpc.time
> >
> > to find out what the actual delay was.
> >
> > If it reports any, that will be interesting.  Try with a larger time if
> > necessary to get a modest number of hits.  Using editcap and the given
> > frame number you can select out 1000 packets either side of the problem
> > and that should compress to be small enough to transport.
> >
> > However it might not find anything.  If the reply never arrives, you'll
> > never get a reply with a long timeout.  So we need to check that
> > everything got a reply...
> >
> >  tshark -r filename -t tcp.port==2049,rpc  \
> >    -Y 'tcp.port==2049 && rpc.msg == 0' -T fields \
> >    -e rpc.xid -e frame.number | sort > /tmp/requests
> >
> >  tshark -r filename -t tcp.port==2049,rpc  \
> >    -Y 'tcp.port==2049 && rpc.msg == 1' -T fields \
> >    -e rpc.xid -e frame.number | sort > /tmp/replies
> >
> >  join -a1 /tmp/requests /tmp/replies | awk 'NF==2'
> >
> > This should list the xid and frame number of all requests that didn't
> > get a reply.  Again, editcap can extract a range of frames into a file of
> > manageable size.
> >
> > Another possibility is that requests are getting replies, but the reply
> > says "NFS4ERR_DELAY"
> >
> >  tshark -r filename -t tcp.port==2049,rpc -Y nfs.nfsstat4==10008
> >
> > should report any reply with that error code.
> >
> > Hopefully something there will be interesting.
> >
> > NeilBrown

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-15  1:23               ` Mike Javorski
@ 2021-08-16  1:20                 ` NeilBrown
  2021-08-16 13:21                   ` Chuck Lever III
  2021-08-16 16:09                   ` Mike Javorski
  0 siblings, 2 replies; 47+ messages in thread
From: NeilBrown @ 2021-08-16  1:20 UTC (permalink / raw)
  To: Mike Javorski; +Cc: linux-nfs

On Sun, 15 Aug 2021, Mike Javorski wrote:
> I managed to get a cap with several discreet freezes in it, and I
> included a chunk with 5 of them in the span of ~3000 frames. I added
> packet comments at each frame that the tshark command reported as > 1
> sec RPC wait. Just search for "Freeze" in (wire|t)shark in packet
> details. This is with kernel 5.13.10 provided by Arch (See
> https://github.com/archlinux/linux/compare/a37da2be8e6c85...v5.13.10-arch1
> for diff vs mainline, nothing NFS/RPC related I can identify).
> 
> I tried unsuccessfully to get any failures with the 5.12.15 kernel.
> 
> https://drive.google.com/file/d/1T42iX9xCdF9Oe4f7JXsnWqD8oJPrpMqV/view?usp=sharing
> 
> File should be downloadable anonymously.

Got it, thanks.

There are 3 RPC replies that are more than 1 second after the request.
The replies are in frames 983, 1005, 1777 These roughly correspond to
where you added the "Freeze" annotation (I didn't know you could do that!).

983:
  This is the (Start of the) reply to a READ Request in frame 980.
  It is a 128K read.  The whole reply has arrived by frame 1004, 2ms
  later.
  The request (Frame 980) is followed 13ms later by a TCP retransmit
  of the request and the (20usec later) a TCP ACK from the server.

  The fact that the client needed to retransmit seems a little odd
  but as it is the only retransmit in the whole capture, I don't think
  we can read much into it.

1005:
  This is the reply to a 128K READ request in frame 793 - earlier than
  previous one.
  So there were two READ requests, then a 2 second delay then both
  replies in reverse order.

1777:
  This is a similar READ reply - to 1761.
  There were READs in 1760, 1761, and 1775
  1760 is replied to almost immediately
  1761 gets a reply in 4 seconds (1777)
  1775 never gets a reply (in the available packet capture).

Looking at other delays ... most READs get a reply in under a millisec.
There are about 20 where the reply is more than 1ms - the longest delay
not already mentioned is 90ms with reply 1857.
The pattern here is 
   READ req (1)
   GETATTR req
   GETATTR reply
   READ req (2)
   READ reply (1)
  pause
   READ reply (2)

I suspect this is the same problem occurring, but it isn't so
noticeable.

My first thought was that the reply might be getting stuck in the TCP
transmit queue on the server, but checking the TSval in the TCP
timestamp option shows that - for frame 983 which shows a 2second delay
- the TSval is also 2seconds later than the previous packet.  So the
delay happens before the TCP-level decision to create the packet.

So I cannot see any real evidence to suggest a TCP-level problem.
The time of 2 or 4 seconds - and maybe even 90ms - seem unlikely to be
caused by an NFSd problem.

So my guess is that the delay comes from the filesystem.  Maybe.
What filesystem are you exporting?

How can we check this? Probably by turning on nfsd tracing.
There are a bunch of tracepoints that related to reading:

	trace_nfsd_read_start
	trace_nfsd_read_done
	trace_nfsd_read_io_done
	trace_nfsd_read_err
	trace_nfsd_read_splice
	trace_nfsd_read_vector
	trace_nfsd_read_start
	trace_nfsd_read_done

Maybe enabling them might be useful as you should be able to see if the
delay was within one read request, or between two read requests.
But I don't have much (if any) experience in enabling trace points.  I
really should try that some day.  Maybe you can find guidance on using
these tracepoint somewhere ... or maybe you already know how :-)

NeilBrown

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-16  1:20                 ` NeilBrown
@ 2021-08-16 13:21                   ` Chuck Lever III
  2021-08-16 16:25                     ` Mike Javorski
  2021-08-16 16:09                   ` Mike Javorski
  1 sibling, 1 reply; 47+ messages in thread
From: Chuck Lever III @ 2021-08-16 13:21 UTC (permalink / raw)
  To: Mike Javorski; +Cc: Linux NFS Mailing List, Neil Brown



> On Aug 15, 2021, at 9:20 PM, NeilBrown <neilb@suse.de> wrote:
> 
> On Sun, 15 Aug 2021, Mike Javorski wrote:
>> I managed to get a cap with several discreet freezes in it, and I
>> included a chunk with 5 of them in the span of ~3000 frames. I added
>> packet comments at each frame that the tshark command reported as > 1
>> sec RPC wait. Just search for "Freeze" in (wire|t)shark in packet
>> details. This is with kernel 5.13.10 provided by Arch (See
>> https://github.com/archlinux/linux/compare/a37da2be8e6c85...v5.13.10-arch1
>> for diff vs mainline, nothing NFS/RPC related I can identify).
>> 
>> I tried unsuccessfully to get any failures with the 5.12.15 kernel.
>> 
>> https://drive.google.com/file/d/1T42iX9xCdF9Oe4f7JXsnWqD8oJPrpMqV/view?usp=sharing
>> 
>> File should be downloadable anonymously.
> 
> Got it, thanks.
> 
> There are 3 RPC replies that are more than 1 second after the request.
> The replies are in frames 983, 1005, 1777 These roughly correspond to
> where you added the "Freeze" annotation (I didn't know you could do that!).
> 
> 983:
>  This is the (Start of the) reply to a READ Request in frame 980.
>  It is a 128K read.  The whole reply has arrived by frame 1004, 2ms
>  later.
>  The request (Frame 980) is followed 13ms later by a TCP retransmit
>  of the request and the (20usec later) a TCP ACK from the server.
> 
>  The fact that the client needed to retransmit seems a little odd
>  but as it is the only retransmit in the whole capture, I don't think
>  we can read much into it.
> 
> 1005:
>  This is the reply to a 128K READ request in frame 793 - earlier than
>  previous one.
>  So there were two READ requests, then a 2 second delay then both
>  replies in reverse order.
> 
> 1777:
>  This is a similar READ reply - to 1761.
>  There were READs in 1760, 1761, and 1775
>  1760 is replied to almost immediately
>  1761 gets a reply in 4 seconds (1777)
>  1775 never gets a reply (in the available packet capture).
> 
> Looking at other delays ... most READs get a reply in under a millisec.
> There are about 20 where the reply is more than 1ms - the longest delay
> not already mentioned is 90ms with reply 1857.
> The pattern here is 
>   READ req (1)
>   GETATTR req
>   GETATTR reply
>   READ req (2)
>   READ reply (1)
>  pause
>   READ reply (2)
> 
> I suspect this is the same problem occurring, but it isn't so
> noticeable.
> 
> My first thought was that the reply might be getting stuck in the TCP
> transmit queue on the server, but checking the TSval in the TCP
> timestamp option shows that - for frame 983 which shows a 2second delay
> - the TSval is also 2seconds later than the previous packet.  So the
> delay happens before the TCP-level decision to create the packet.
> 
> So I cannot see any real evidence to suggest a TCP-level problem.
> The time of 2 or 4 seconds - and maybe even 90ms - seem unlikely to be
> caused by an NFSd problem.
> 
> So my guess is that the delay comes from the filesystem.  Maybe.
> What filesystem are you exporting?
> 
> How can we check this? Probably by turning on nfsd tracing.
> There are a bunch of tracepoints that related to reading:
> 
> 	trace_nfsd_read_start
> 	trace_nfsd_read_done
> 	trace_nfsd_read_io_done
> 	trace_nfsd_read_err
> 	trace_nfsd_read_splice
> 	trace_nfsd_read_vector
> 	trace_nfsd_read_start
> 	trace_nfsd_read_done
> 
> Maybe enabling them might be useful as you should be able to see if the
> delay was within one read request, or between two read requests.
> But I don't have much (if any) experience in enabling trace points.  I
> really should try that some day.  Maybe you can find guidance on using
> these tracepoint somewhere ... or maybe you already know how :-)

Try something like:

# trace-cmd record -e nfsd:nfsd_read_\*

When the test is done, ^C the trace-cmd program. The trace records
are stored in a file called trace.dat. You can view them with:

# trace-cmd report | less

The trace.dat file is portable. It carries the format specifiers
for all records events, so the trace records can be viewed on 
other systems.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-16  1:20                 ` NeilBrown
  2021-08-16 13:21                   ` Chuck Lever III
@ 2021-08-16 16:09                   ` Mike Javorski
  2021-08-16 23:04                     ` NeilBrown
  1 sibling, 1 reply; 47+ messages in thread
From: Mike Javorski @ 2021-08-16 16:09 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-nfs

Neil:

A possible insight, these captures were all made on the server, so the
rpc reply wouldn't be delayed (at least if the server is the one doing
the replying). I did the captures there because I wanted to get as
close to possible issues as I could. If it makes sense, I can try to
capture from the client side, let me know.

Filesystem being exported is BTRFS, a 6 disk array. I jumped to nfs
being the trouble because that is my visibility to the issue, but it's
certainly possible for it to be the underlying filesystem. Maybe the
nfsd traces will give insight to this.

- mike

On Sun, Aug 15, 2021 at 6:20 PM NeilBrown <neilb@suse.de> wrote:
>
> On Sun, 15 Aug 2021, Mike Javorski wrote:
> > I managed to get a cap with several discreet freezes in it, and I
> > included a chunk with 5 of them in the span of ~3000 frames. I added
> > packet comments at each frame that the tshark command reported as > 1
> > sec RPC wait. Just search for "Freeze" in (wire|t)shark in packet
> > details. This is with kernel 5.13.10 provided by Arch (See
> > https://github.com/archlinux/linux/compare/a37da2be8e6c85...v5.13.10-arch1
> > for diff vs mainline, nothing NFS/RPC related I can identify).
> >
> > I tried unsuccessfully to get any failures with the 5.12.15 kernel.
> >
> > https://drive.google.com/file/d/1T42iX9xCdF9Oe4f7JXsnWqD8oJPrpMqV/view?usp=sharing
> >
> > File should be downloadable anonymously.
>
> Got it, thanks.
>
> There are 3 RPC replies that are more than 1 second after the request.
> The replies are in frames 983, 1005, 1777 These roughly correspond to
> where you added the "Freeze" annotation (I didn't know you could do that!).
>
> 983:
>   This is the (Start of the) reply to a READ Request in frame 980.
>   It is a 128K read.  The whole reply has arrived by frame 1004, 2ms
>   later.
>   The request (Frame 980) is followed 13ms later by a TCP retransmit
>   of the request and the (20usec later) a TCP ACK from the server.
>
>   The fact that the client needed to retransmit seems a little odd
>   but as it is the only retransmit in the whole capture, I don't think
>   we can read much into it.
>
> 1005:
>   This is the reply to a 128K READ request in frame 793 - earlier than
>   previous one.
>   So there were two READ requests, then a 2 second delay then both
>   replies in reverse order.
>
> 1777:
>   This is a similar READ reply - to 1761.
>   There were READs in 1760, 1761, and 1775
>   1760 is replied to almost immediately
>   1761 gets a reply in 4 seconds (1777)
>   1775 never gets a reply (in the available packet capture).
>
> Looking at other delays ... most READs get a reply in under a millisec.
> There are about 20 where the reply is more than 1ms - the longest delay
> not already mentioned is 90ms with reply 1857.
> The pattern here is
>    READ req (1)
>    GETATTR req
>    GETATTR reply
>    READ req (2)
>    READ reply (1)
>   pause
>    READ reply (2)
>
> I suspect this is the same problem occurring, but it isn't so
> noticeable.
>
> My first thought was that the reply might be getting stuck in the TCP
> transmit queue on the server, but checking the TSval in the TCP
> timestamp option shows that - for frame 983 which shows a 2second delay
> - the TSval is also 2seconds later than the previous packet.  So the
> delay happens before the TCP-level decision to create the packet.
>
> So I cannot see any real evidence to suggest a TCP-level problem.
> The time of 2 or 4 seconds - and maybe even 90ms - seem unlikely to be
> caused by an NFSd problem.
>
> So my guess is that the delay comes from the filesystem.  Maybe.
> What filesystem are you exporting?
>
> How can we check this? Probably by turning on nfsd tracing.
> There are a bunch of tracepoints that related to reading:
>
>         trace_nfsd_read_start
>         trace_nfsd_read_done
>         trace_nfsd_read_io_done
>         trace_nfsd_read_err
>         trace_nfsd_read_splice
>         trace_nfsd_read_vector
>         trace_nfsd_read_start
>         trace_nfsd_read_done
>
> Maybe enabling them might be useful as you should be able to see if the
> delay was within one read request, or between two read requests.
> But I don't have much (if any) experience in enabling trace points.  I
> really should try that some day.  Maybe you can find guidance on using
> these tracepoint somewhere ... or maybe you already know how :-)
>
> NeilBrown

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-16 13:21                   ` Chuck Lever III
@ 2021-08-16 16:25                     ` Mike Javorski
  2021-08-16 23:01                       ` NeilBrown
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Javorski @ 2021-08-16 16:25 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Linux NFS Mailing List, Neil Brown

Thanks for the pointer Chuck

I ran the trace capture and was able to trigger two freezes in
relatively short order. Here is that trace file:
https://drive.google.com/file/d/1qk_VIMkj8aTeQIg5O0W3AvWyeDSWL3vW/view?usp=sharing

- mike


On Mon, Aug 16, 2021 at 6:21 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
>
> > On Aug 15, 2021, at 9:20 PM, NeilBrown <neilb@suse.de> wrote:
> >
> > On Sun, 15 Aug 2021, Mike Javorski wrote:
> >> I managed to get a cap with several discreet freezes in it, and I
> >> included a chunk with 5 of them in the span of ~3000 frames. I added
> >> packet comments at each frame that the tshark command reported as > 1
> >> sec RPC wait. Just search for "Freeze" in (wire|t)shark in packet
> >> details. This is with kernel 5.13.10 provided by Arch (See
> >> https://github.com/archlinux/linux/compare/a37da2be8e6c85...v5.13.10-arch1
> >> for diff vs mainline, nothing NFS/RPC related I can identify).
> >>
> >> I tried unsuccessfully to get any failures with the 5.12.15 kernel.
> >>
> >> https://drive.google.com/file/d/1T42iX9xCdF9Oe4f7JXsnWqD8oJPrpMqV/view?usp=sharing
> >>
> >> File should be downloadable anonymously.
> >
> > Got it, thanks.
> >
> > There are 3 RPC replies that are more than 1 second after the request.
> > The replies are in frames 983, 1005, 1777 These roughly correspond to
> > where you added the "Freeze" annotation (I didn't know you could do that!).
> >
> > 983:
> >  This is the (Start of the) reply to a READ Request in frame 980.
> >  It is a 128K read.  The whole reply has arrived by frame 1004, 2ms
> >  later.
> >  The request (Frame 980) is followed 13ms later by a TCP retransmit
> >  of the request and the (20usec later) a TCP ACK from the server.
> >
> >  The fact that the client needed to retransmit seems a little odd
> >  but as it is the only retransmit in the whole capture, I don't think
> >  we can read much into it.
> >
> > 1005:
> >  This is the reply to a 128K READ request in frame 793 - earlier than
> >  previous one.
> >  So there were two READ requests, then a 2 second delay then both
> >  replies in reverse order.
> >
> > 1777:
> >  This is a similar READ reply - to 1761.
> >  There were READs in 1760, 1761, and 1775
> >  1760 is replied to almost immediately
> >  1761 gets a reply in 4 seconds (1777)
> >  1775 never gets a reply (in the available packet capture).
> >
> > Looking at other delays ... most READs get a reply in under a millisec.
> > There are about 20 where the reply is more than 1ms - the longest delay
> > not already mentioned is 90ms with reply 1857.
> > The pattern here is
> >   READ req (1)
> >   GETATTR req
> >   GETATTR reply
> >   READ req (2)
> >   READ reply (1)
> >  pause
> >   READ reply (2)
> >
> > I suspect this is the same problem occurring, but it isn't so
> > noticeable.
> >
> > My first thought was that the reply might be getting stuck in the TCP
> > transmit queue on the server, but checking the TSval in the TCP
> > timestamp option shows that - for frame 983 which shows a 2second delay
> > - the TSval is also 2seconds later than the previous packet.  So the
> > delay happens before the TCP-level decision to create the packet.
> >
> > So I cannot see any real evidence to suggest a TCP-level problem.
> > The time of 2 or 4 seconds - and maybe even 90ms - seem unlikely to be
> > caused by an NFSd problem.
> >
> > So my guess is that the delay comes from the filesystem.  Maybe.
> > What filesystem are you exporting?
> >
> > How can we check this? Probably by turning on nfsd tracing.
> > There are a bunch of tracepoints that related to reading:
> >
> >       trace_nfsd_read_start
> >       trace_nfsd_read_done
> >       trace_nfsd_read_io_done
> >       trace_nfsd_read_err
> >       trace_nfsd_read_splice
> >       trace_nfsd_read_vector
> >       trace_nfsd_read_start
> >       trace_nfsd_read_done
> >
> > Maybe enabling them might be useful as you should be able to see if the
> > delay was within one read request, or between two read requests.
> > But I don't have much (if any) experience in enabling trace points.  I
> > really should try that some day.  Maybe you can find guidance on using
> > these tracepoint somewhere ... or maybe you already know how :-)
>
> Try something like:
>
> # trace-cmd record -e nfsd:nfsd_read_\*
>
> When the test is done, ^C the trace-cmd program. The trace records
> are stored in a file called trace.dat. You can view them with:
>
> # trace-cmd report | less
>
> The trace.dat file is portable. It carries the format specifiers
> for all records events, so the trace records can be viewed on
> other systems.
>
>
> --
> Chuck Lever
>
>
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-16 16:25                     ` Mike Javorski
@ 2021-08-16 23:01                       ` NeilBrown
  2021-08-20  0:31                         ` NeilBrown
  0 siblings, 1 reply; 47+ messages in thread
From: NeilBrown @ 2021-08-16 23:01 UTC (permalink / raw)
  To: Mike Javorski; +Cc: Chuck Lever III, Linux NFS Mailing List

On Tue, 17 Aug 2021, Mike Javorski wrote:
> Thanks for the pointer Chuck
> 
> I ran the trace capture and was able to trigger two freezes in
> relatively short order. Here is that trace file:
> https://drive.google.com/file/d/1qk_VIMkj8aTeQIg5O0W3AvWyeDSWL3vW/view?usp=sharing
> 

There are gaps of 5.3sec, 4sec, 1sec and 1sec between adjacent trace
points.
The longest gap between 'start' and 'done' for any given xid is 354msec.
So it doesn't look like the filesystem read causing the problem.

The long gaps between adjacent records are:
0.354581 xid=0x4c6ac2b6
0.418340 xid=0x6a71c2b6
1.013260 xid=0x6f71c2b6
1.020742 xid=0x0f71c2b6
4.064527 xid=0x6171c2b6
5.396559 xid=0xd66dc2b6

The fact 1, 1, and 4 are so close to a precise number of seconds seems
unlikely to be a co-incidence.  It isn't clear what it might mean
though.

I don't have any immediae ideas where to look next.  I'll let you know
if I come up with something.

NeilBrown

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-16 16:09                   ` Mike Javorski
@ 2021-08-16 23:04                     ` NeilBrown
  0 siblings, 0 replies; 47+ messages in thread
From: NeilBrown @ 2021-08-16 23:04 UTC (permalink / raw)
  To: Mike Javorski; +Cc: linux-nfs

On Tue, 17 Aug 2021, Mike Javorski wrote:
> Neil:
> 
> A possible insight, these captures were all made on the server, so the
> rpc reply wouldn't be delayed (at least if the server is the one doing
> the replying). I did the captures there because I wanted to get as
> close to possible issues as I could. If it makes sense, I can try to
> capture from the client side, let me know.

Sometimes it does make sense to compare both ends of the piece of wire. 
However in this case the data collected at the server point the delays
on the server so it is unlikely that there are delays or losses between
server and client.

Thanks,
NeilBrown

> 
> Filesystem being exported is BTRFS, a 6 disk array. I jumped to nfs
> being the trouble because that is my visibility to the issue, but it's
> certainly possible for it to be the underlying filesystem. Maybe the
> nfsd traces will give insight to this.
> 
> - mike
> 
> On Sun, Aug 15, 2021 at 6:20 PM NeilBrown <neilb@suse.de> wrote:
> >
> > On Sun, 15 Aug 2021, Mike Javorski wrote:
> > > I managed to get a cap with several discreet freezes in it, and I
> > > included a chunk with 5 of them in the span of ~3000 frames. I added
> > > packet comments at each frame that the tshark command reported as > 1
> > > sec RPC wait. Just search for "Freeze" in (wire|t)shark in packet
> > > details. This is with kernel 5.13.10 provided by Arch (See
> > > https://github.com/archlinux/linux/compare/a37da2be8e6c85...v5.13.10-arch1
> > > for diff vs mainline, nothing NFS/RPC related I can identify).
> > >
> > > I tried unsuccessfully to get any failures with the 5.12.15 kernel.
> > >
> > > https://drive.google.com/file/d/1T42iX9xCdF9Oe4f7JXsnWqD8oJPrpMqV/view?usp=sharing
> > >
> > > File should be downloadable anonymously.
> >
> > Got it, thanks.
> >
> > There are 3 RPC replies that are more than 1 second after the request.
> > The replies are in frames 983, 1005, 1777 These roughly correspond to
> > where you added the "Freeze" annotation (I didn't know you could do that!).
> >
> > 983:
> >   This is the (Start of the) reply to a READ Request in frame 980.
> >   It is a 128K read.  The whole reply has arrived by frame 1004, 2ms
> >   later.
> >   The request (Frame 980) is followed 13ms later by a TCP retransmit
> >   of the request and the (20usec later) a TCP ACK from the server.
> >
> >   The fact that the client needed to retransmit seems a little odd
> >   but as it is the only retransmit in the whole capture, I don't think
> >   we can read much into it.
> >
> > 1005:
> >   This is the reply to a 128K READ request in frame 793 - earlier than
> >   previous one.
> >   So there were two READ requests, then a 2 second delay then both
> >   replies in reverse order.
> >
> > 1777:
> >   This is a similar READ reply - to 1761.
> >   There were READs in 1760, 1761, and 1775
> >   1760 is replied to almost immediately
> >   1761 gets a reply in 4 seconds (1777)
> >   1775 never gets a reply (in the available packet capture).
> >
> > Looking at other delays ... most READs get a reply in under a millisec.
> > There are about 20 where the reply is more than 1ms - the longest delay
> > not already mentioned is 90ms with reply 1857.
> > The pattern here is
> >    READ req (1)
> >    GETATTR req
> >    GETATTR reply
> >    READ req (2)
> >    READ reply (1)
> >   pause
> >    READ reply (2)
> >
> > I suspect this is the same problem occurring, but it isn't so
> > noticeable.
> >
> > My first thought was that the reply might be getting stuck in the TCP
> > transmit queue on the server, but checking the TSval in the TCP
> > timestamp option shows that - for frame 983 which shows a 2second delay
> > - the TSval is also 2seconds later than the previous packet.  So the
> > delay happens before the TCP-level decision to create the packet.
> >
> > So I cannot see any real evidence to suggest a TCP-level problem.
> > The time of 2 or 4 seconds - and maybe even 90ms - seem unlikely to be
> > caused by an NFSd problem.
> >
> > So my guess is that the delay comes from the filesystem.  Maybe.
> > What filesystem are you exporting?
> >
> > How can we check this? Probably by turning on nfsd tracing.
> > There are a bunch of tracepoints that related to reading:
> >
> >         trace_nfsd_read_start
> >         trace_nfsd_read_done
> >         trace_nfsd_read_io_done
> >         trace_nfsd_read_err
> >         trace_nfsd_read_splice
> >         trace_nfsd_read_vector
> >         trace_nfsd_read_start
> >         trace_nfsd_read_done
> >
> > Maybe enabling them might be useful as you should be able to see if the
> > delay was within one read request, or between two read requests.
> > But I don't have much (if any) experience in enabling trace points.  I
> > really should try that some day.  Maybe you can find guidance on using
> > these tracepoint somewhere ... or maybe you already know how :-)
> >
> > NeilBrown
> 
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-16 23:01                       ` NeilBrown
@ 2021-08-20  0:31                         ` NeilBrown
  2021-08-20  0:52                           ` Mike Javorski
  0 siblings, 1 reply; 47+ messages in thread
From: NeilBrown @ 2021-08-20  0:31 UTC (permalink / raw)
  To: Mike Javorski; +Cc: Chuck Lever III, Linux NFS Mailing List

On Tue, 17 Aug 2021, NeilBrown wrote:
> On Tue, 17 Aug 2021, Mike Javorski wrote:
> > Thanks for the pointer Chuck
> > 
> > I ran the trace capture and was able to trigger two freezes in
> > relatively short order. Here is that trace file:
> > https://drive.google.com/file/d/1qk_VIMkj8aTeQIg5O0W3AvWyeDSWL3vW/view?usp=sharing
> > 
> 
> There are gaps of 5.3sec, 4sec, 1sec and 1sec between adjacent trace
> points.
> The longest gap between 'start' and 'done' for any given xid is 354msec.
> So it doesn't look like the filesystem read causing the problem.
> 
> The long gaps between adjacent records are:
> 0.354581 xid=0x4c6ac2b6
> 0.418340 xid=0x6a71c2b6
> 1.013260 xid=0x6f71c2b6
> 1.020742 xid=0x0f71c2b6
> 4.064527 xid=0x6171c2b6
> 5.396559 xid=0xd66dc2b6
> 
> The fact 1, 1, and 4 are so close to a precise number of seconds seems
> unlikely to be a co-incidence.  It isn't clear what it might mean
> though.
> 
> I don't have any immediae ideas where to look next.  I'll let you know
> if I come up with something.

I had separate bug report
  https://bugzilla.suse.com/show_bug.cgi?id=1189508
with broadly similar symptoms which strongly points to network-layer
problem.  So I'm digging back into that tcpdump.

The TCP ACK for READ requests usually goes out in 100 or 200
microseconds.  Sometimes longer - by a factor of about 100.

% tshark -r /tmp/javmorin-nfsfreeze-5.13.10-cap1\ .test.pcap  -d tcp.port==2049,rpc -Y 'tcp.port==2049 && rpc.msgtyp==0 && nfs.opcode==25' -T fields -e tcp.seq -e frame.time_epoch -e frame.number | sed 's/$/ READ/' > /tmp/read-times
% tshark -r /tmp/javmorin-nfsfreeze-5.13.10-cap1\ .test.pcap  -d tcp.port==2049,rpc -Y 'tcp.srcport==2049' -T fields -e tcp.ack -e frame.time_epoch -e frame.number | sed 's/$/ ACK/' > /tmp/ack-times
% sort -n /tmp/read-times /tmp/ack-times  | awk '$4 == "ACK" && readtime {printf "%f %d\n", ($2 - readtime), $3; readtime=0} $4 == "READ" {readtime=$2}' | sort -n | tail -4
0.000360 441
0.012777 982
0.013318 1086
0.042356 1776

Three times (out of 245) the TCP ACK was 12 milliseconds or longer.  The
first and last were times when the reply was delayed by a couple of
seconds at least.  The other (1086) was the reply to 1085 - no obvious
delay to the READ reply.

These unusual delays (short though they are) suggest something odd in
the network layer - particularly as they are sometimes followed by a
much larger delay in a READ reply.

It might be as simple as a memory allocation delay.  It might be
something deep in the networking layer.

If you still have (or could generate) some larger tcp dumps, you could
try that code and see if 5.12 shows any ACK delays, and if 5.13 shows
more delays than turn into READ delays.  IF 5.13 easily shows a few ACK
delays thats 5.12 doesn't, then they might make a git-bisect more
reliable.  Having thousands of READs in the trace rather than just 245
should produce more reliable data.


As I suggested in the bug report I linked above, I am suspicious of TCP
offload when I see symptoms like this.  You can use "ethtool" to turn
off that various offload features.  Doing that and trying to reproduce
without offload might also provide useful information.

NeilBrown

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-20  0:31                         ` NeilBrown
@ 2021-08-20  0:52                           ` Mike Javorski
  2021-08-22  0:17                             ` Mike Javorski
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Javorski @ 2021-08-20  0:52 UTC (permalink / raw)
  To: NeilBrown; +Cc: Chuck Lever III, Linux NFS Mailing List

Neil:

I did still have the original/source cap for that 5.13 file. I ran the
command you gave and got the 10 last entries:
0.013086 31352
0.013318 28399
0.013339 3521
0.013783 32423
0.014519 32481
0.014773 6145
0.015063 19034
0.015829 25892
0.042356 29089
0.042581 5779

I did have another test cap which was from an early 5.13 dev compile
prior to the nfs patches (what the Archlinux devs asked me to try). In
that file (which is 1.8G vs the 400M of the above file), the 10 last
entries are:
0.042038 575
0.042190 87093
0.042313 30183
0.042377 34941
0.042521 53593
0.042801 113067
0.042930 1174
0.043285 87326
0.043851 61066
0.107649 115114

As that seems even worse than the above, I think I need to get a clean
5.12 capture to see if it actually was "better" beforehand.
Unfortunately I am stuck on a work issue tonight, but I will test it
either tomorrow or at the latest, this weekend.

Thanks for following up on this.

- mike

On Thu, Aug 19, 2021 at 5:31 PM NeilBrown <neilb@suse.de> wrote:
>
> On Tue, 17 Aug 2021, NeilBrown wrote:
> > On Tue, 17 Aug 2021, Mike Javorski wrote:
> > > Thanks for the pointer Chuck
> > >
> > > I ran the trace capture and was able to trigger two freezes in
> > > relatively short order. Here is that trace file:
> > > https://drive.google.com/file/d/1qk_VIMkj8aTeQIg5O0W3AvWyeDSWL3vW/view?usp=sharing
> > >
> >
> > There are gaps of 5.3sec, 4sec, 1sec and 1sec between adjacent trace
> > points.
> > The longest gap between 'start' and 'done' for any given xid is 354msec.
> > So it doesn't look like the filesystem read causing the problem.
> >
> > The long gaps between adjacent records are:
> > 0.354581 xid=0x4c6ac2b6
> > 0.418340 xid=0x6a71c2b6
> > 1.013260 xid=0x6f71c2b6
> > 1.020742 xid=0x0f71c2b6
> > 4.064527 xid=0x6171c2b6
> > 5.396559 xid=0xd66dc2b6
> >
> > The fact 1, 1, and 4 are so close to a precise number of seconds seems
> > unlikely to be a co-incidence.  It isn't clear what it might mean
> > though.
> >
> > I don't have any immediae ideas where to look next.  I'll let you know
> > if I come up with something.
>
> I had separate bug report
>   https://bugzilla.suse.com/show_bug.cgi?id=1189508
> with broadly similar symptoms which strongly points to network-layer
> problem.  So I'm digging back into that tcpdump.
>
> The TCP ACK for READ requests usually goes out in 100 or 200
> microseconds.  Sometimes longer - by a factor of about 100.
>
> % tshark -r /tmp/javmorin-nfsfreeze-5.13.10-cap1\ .test.pcap  -d tcp.port==2049,rpc -Y 'tcp.port==2049 && rpc.msgtyp==0 && nfs.opcode==25' -T fields -e tcp.seq -e frame.time_epoch -e frame.number | sed 's/$/ READ/' > /tmp/read-times
> % tshark -r /tmp/javmorin-nfsfreeze-5.13.10-cap1\ .test.pcap  -d tcp.port==2049,rpc -Y 'tcp.srcport==2049' -T fields -e tcp.ack -e frame.time_epoch -e frame.number | sed 's/$/ ACK/' > /tmp/ack-times
> % sort -n /tmp/read-times /tmp/ack-times  | awk '$4 == "ACK" && readtime {printf "%f %d\n", ($2 - readtime), $3; readtime=0} $4 == "READ" {readtime=$2}' | sort -n | tail -4
> 0.000360 441
> 0.012777 982
> 0.013318 1086
> 0.042356 1776
>
> Three times (out of 245) the TCP ACK was 12 milliseconds or longer.  The
> first and last were times when the reply was delayed by a couple of
> seconds at least.  The other (1086) was the reply to 1085 - no obvious
> delay to the READ reply.
>
> These unusual delays (short though they are) suggest something odd in
> the network layer - particularly as they are sometimes followed by a
> much larger delay in a READ reply.
>
> It might be as simple as a memory allocation delay.  It might be
> something deep in the networking layer.
>
> If you still have (or could generate) some larger tcp dumps, you could
> try that code and see if 5.12 shows any ACK delays, and if 5.13 shows
> more delays than turn into READ delays.  IF 5.13 easily shows a few ACK
> delays thats 5.12 doesn't, then they might make a git-bisect more
> reliable.  Having thousands of READs in the trace rather than just 245
> should produce more reliable data.
>
>
> As I suggested in the bug report I linked above, I am suspicious of TCP
> offload when I see symptoms like this.  You can use "ethtool" to turn
> off that various offload features.  Doing that and trying to reproduce
> without offload might also provide useful information.
>
> NeilBrown

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-20  0:52                           ` Mike Javorski
@ 2021-08-22  0:17                             ` Mike Javorski
  2021-08-22  3:41                               ` NeilBrown
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Javorski @ 2021-08-22  0:17 UTC (permalink / raw)
  To: NeilBrown; +Cc: Chuck Lever III, Linux NFS Mailing List

OK, so new/fresh captures, reading the same set of files via NFS in
roughly the same timing/sequence (client unchanged between runs)

5.12.15-arch1:
===============
0.042320 124082
0.042594 45837
0.043176 19598
0.044092 63667
0.044613 28192
0.045045 131268
0.045982 116572
0.058507 162444
0.369620 153520
0.825167 164682

5.13.12-arch1: (no freezes)
===============
0.040766 12679
0.041565 64532
0.041799 55440
0.042091 159640
0.042105 75075
0.042134 177776
0.042706 40
0.043334 35322
0.045480 183618
0.204246 83997

Since I didn't get any freezes, I waited a bit, tried again and got a
capture with several freezes...

5.13.12-arch1: (with freezes)
===============
0.042143 55425
0.042252 64787
0.042411 57362
0.042441 34461
0.042503 67041
0.042553 64812
0.042592 55179
0.042715 67002
0.042835 66977
0.043308 64849

Not sure what to make of this, but to my (admittedly untrainted) eye,
the two 5.13.12 sets are very similar to each other as well as to the
5.12.15 sample, I am not sure if this is giving any indication to what
is causing the freezes.

- mike




On Thu, Aug 19, 2021 at 5:52 PM Mike Javorski <mike.javorski@gmail.com> wrote:
>
> Neil:
>
> I did still have the original/source cap for that 5.13 file. I ran the
> command you gave and got the 10 last entries:
> 0.013086 31352
> 0.013318 28399
> 0.013339 3521
> 0.013783 32423
> 0.014519 32481
> 0.014773 6145
> 0.015063 19034
> 0.015829 25892
> 0.042356 29089
> 0.042581 5779
>
> I did have another test cap which was from an early 5.13 dev compile
> prior to the nfs patches (what the Archlinux devs asked me to try). In
> that file (which is 1.8G vs the 400M of the above file), the 10 last
> entries are:
> 0.042038 575
> 0.042190 87093
> 0.042313 30183
> 0.042377 34941
> 0.042521 53593
> 0.042801 113067
> 0.042930 1174
> 0.043285 87326
> 0.043851 61066
> 0.107649 115114
>
> As that seems even worse than the above, I think I need to get a clean
> 5.12 capture to see if it actually was "better" beforehand.
> Unfortunately I am stuck on a work issue tonight, but I will test it
> either tomorrow or at the latest, this weekend.
>
> Thanks for following up on this.
>
> - mike
>
> On Thu, Aug 19, 2021 at 5:31 PM NeilBrown <neilb@suse.de> wrote:
> >
> > On Tue, 17 Aug 2021, NeilBrown wrote:
> > > On Tue, 17 Aug 2021, Mike Javorski wrote:
> > > > Thanks for the pointer Chuck
> > > >
> > > > I ran the trace capture and was able to trigger two freezes in
> > > > relatively short order. Here is that trace file:
> > > > https://drive.google.com/file/d/1qk_VIMkj8aTeQIg5O0W3AvWyeDSWL3vW/view?usp=sharing
> > > >
> > >
> > > There are gaps of 5.3sec, 4sec, 1sec and 1sec between adjacent trace
> > > points.
> > > The longest gap between 'start' and 'done' for any given xid is 354msec.
> > > So it doesn't look like the filesystem read causing the problem.
> > >
> > > The long gaps between adjacent records are:
> > > 0.354581 xid=0x4c6ac2b6
> > > 0.418340 xid=0x6a71c2b6
> > > 1.013260 xid=0x6f71c2b6
> > > 1.020742 xid=0x0f71c2b6
> > > 4.064527 xid=0x6171c2b6
> > > 5.396559 xid=0xd66dc2b6
> > >
> > > The fact 1, 1, and 4 are so close to a precise number of seconds seems
> > > unlikely to be a co-incidence.  It isn't clear what it might mean
> > > though.
> > >
> > > I don't have any immediae ideas where to look next.  I'll let you know
> > > if I come up with something.
> >
> > I had separate bug report
> >   https://bugzilla.suse.com/show_bug.cgi?id=1189508
> > with broadly similar symptoms which strongly points to network-layer
> > problem.  So I'm digging back into that tcpdump.
> >
> > The TCP ACK for READ requests usually goes out in 100 or 200
> > microseconds.  Sometimes longer - by a factor of about 100.
> >
> > % tshark -r /tmp/javmorin-nfsfreeze-5.13.10-cap1\ .test.pcap  -d tcp.port==2049,rpc -Y 'tcp.port==2049 && rpc.msgtyp==0 && nfs.opcode==25' -T fields -e tcp.seq -e frame.time_epoch -e frame.number | sed 's/$/ READ/' > /tmp/read-times
> > % tshark -r /tmp/javmorin-nfsfreeze-5.13.10-cap1\ .test.pcap  -d tcp.port==2049,rpc -Y 'tcp.srcport==2049' -T fields -e tcp.ack -e frame.time_epoch -e frame.number | sed 's/$/ ACK/' > /tmp/ack-times
> > % sort -n /tmp/read-times /tmp/ack-times  | awk '$4 == "ACK" && readtime {printf "%f %d\n", ($2 - readtime), $3; readtime=0} $4 == "READ" {readtime=$2}' | sort -n | tail -4
> > 0.000360 441
> > 0.012777 982
> > 0.013318 1086
> > 0.042356 1776
> >
> > Three times (out of 245) the TCP ACK was 12 milliseconds or longer.  The
> > first and last were times when the reply was delayed by a couple of
> > seconds at least.  The other (1086) was the reply to 1085 - no obvious
> > delay to the READ reply.
> >
> > These unusual delays (short though they are) suggest something odd in
> > the network layer - particularly as they are sometimes followed by a
> > much larger delay in a READ reply.
> >
> > It might be as simple as a memory allocation delay.  It might be
> > something deep in the networking layer.
> >
> > If you still have (or could generate) some larger tcp dumps, you could
> > try that code and see if 5.12 shows any ACK delays, and if 5.13 shows
> > more delays than turn into READ delays.  IF 5.13 easily shows a few ACK
> > delays thats 5.12 doesn't, then they might make a git-bisect more
> > reliable.  Having thousands of READs in the trace rather than just 245
> > should produce more reliable data.
> >
> >
> > As I suggested in the bug report I linked above, I am suspicious of TCP
> > offload when I see symptoms like this.  You can use "ethtool" to turn
> > off that various offload features.  Doing that and trying to reproduce
> > without offload might also provide useful information.
> >
> > NeilBrown

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-22  0:17                             ` Mike Javorski
@ 2021-08-22  3:41                               ` NeilBrown
  2021-08-22  4:05                                 ` Mike Javorski
  0 siblings, 1 reply; 47+ messages in thread
From: NeilBrown @ 2021-08-22  3:41 UTC (permalink / raw)
  To: Mike Javorski; +Cc: Chuck Lever III, Linux NFS Mailing List

On Sun, 22 Aug 2021, Mike Javorski wrote:
> OK, so new/fresh captures, reading the same set of files via NFS in
> roughly the same timing/sequence (client unchanged between runs)
> 
> 5.12.15-arch1:
> ===============
> 0.042320 124082
> 0.042594 45837
> 0.043176 19598
> 0.044092 63667
> 0.044613 28192
> 0.045045 131268
> 0.045982 116572
> 0.058507 162444
> 0.369620 153520
> 0.825167 164682
> 
> 5.13.12-arch1: (no freezes)
> ===============
> 0.040766 12679
> 0.041565 64532
> 0.041799 55440
> 0.042091 159640
> 0.042105 75075
> 0.042134 177776
> 0.042706 40
> 0.043334 35322
> 0.045480 183618
> 0.204246 83997
> 
> Since I didn't get any freezes, I waited a bit, tried again and got a
> capture with several freezes...
> 
> 5.13.12-arch1: (with freezes)
> ===============
> 0.042143 55425
> 0.042252 64787
> 0.042411 57362
> 0.042441 34461
> 0.042503 67041
> 0.042553 64812
> 0.042592 55179
> 0.042715 67002
> 0.042835 66977
> 0.043308 64849
> 
> Not sure what to make of this, but to my (admittedly untrainted) eye,
> the two 5.13.12 sets are very similar to each other as well as to the
> 5.12.15 sample, I am not sure if this is giving any indication to what
> is causing the freezes.

In the trace that I have, most times (242 of 245) were 0.000360 or less.
Only 3 were greater.
As your traces are much bigger you naturally have more that are great -
all of the last 10 and probably more.

I has hoping that 5.12 wouldn't show any large delays, but clearly it
does.  However it is still possible that there are substantially fewer.

Rather than using tail, please pipe the list of times into

 awk '$1 < 0.001 { fast +=1 } $1 >= 0.001 {slow += 1} END { print fast, slow, slow / (fast + slow) }'

and report the results.  If the final number (the fraction) is reliably
significantly smaller for 5.12 than for 5.13 (whether it freezes or
not), then this metric may still be useful.

By the way, disabling the various offload options didn't appear to make
a different for the other person who has reported this problem.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-22  3:41                               ` NeilBrown
@ 2021-08-22  4:05                                 ` Mike Javorski
  2021-08-22 22:00                                   ` NeilBrown
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Javorski @ 2021-08-22  4:05 UTC (permalink / raw)
  To: NeilBrown; +Cc: Chuck Lever III, Linux NFS Mailing List

5.12.15-arch1:
===============
14723 137 0.00921938

5.13.12-arch1: (no freezes)
===============
15333 206 0.013257

5.13.12-arch1: (with freezes)
===============
9230 299 0.0313779

So a decent bump w/ 5.13, and even more with the freezes.
This machine is an older Opteron server w/ nforce networking. Not sure
how much offloading it actually does.

- mike

On Sat, Aug 21, 2021 at 8:42 PM NeilBrown <neilb@suse.de> wrote:
>
> On Sun, 22 Aug 2021, Mike Javorski wrote:
> > OK, so new/fresh captures, reading the same set of files via NFS in
> > roughly the same timing/sequence (client unchanged between runs)
> >
> > 5.12.15-arch1:
> > ===============
> > 0.042320 124082
> > 0.042594 45837
> > 0.043176 19598
> > 0.044092 63667
> > 0.044613 28192
> > 0.045045 131268
> > 0.045982 116572
> > 0.058507 162444
> > 0.369620 153520
> > 0.825167 164682
> >
> > 5.13.12-arch1: (no freezes)
> > ===============
> > 0.040766 12679
> > 0.041565 64532
> > 0.041799 55440
> > 0.042091 159640
> > 0.042105 75075
> > 0.042134 177776
> > 0.042706 40
> > 0.043334 35322
> > 0.045480 183618
> > 0.204246 83997
> >
> > Since I didn't get any freezes, I waited a bit, tried again and got a
> > capture with several freezes...
> >
> > 5.13.12-arch1: (with freezes)
> > ===============
> > 0.042143 55425
> > 0.042252 64787
> > 0.042411 57362
> > 0.042441 34461
> > 0.042503 67041
> > 0.042553 64812
> > 0.042592 55179
> > 0.042715 67002
> > 0.042835 66977
> > 0.043308 64849
> >
> > Not sure what to make of this, but to my (admittedly untrainted) eye,
> > the two 5.13.12 sets are very similar to each other as well as to the
> > 5.12.15 sample, I am not sure if this is giving any indication to what
> > is causing the freezes.
>
> In the trace that I have, most times (242 of 245) were 0.000360 or less.
> Only 3 were greater.
> As your traces are much bigger you naturally have more that are great -
> all of the last 10 and probably more.
>
> I has hoping that 5.12 wouldn't show any large delays, but clearly it
> does.  However it is still possible that there are substantially fewer.
>
> Rather than using tail, please pipe the list of times into
>
>  awk '$1 < 0.001 { fast +=1 } $1 >= 0.001 {slow += 1} END { print fast, slow, slow / (fast + slow) }'
>
> and report the results.  If the final number (the fraction) is reliably
> significantly smaller for 5.12 than for 5.13 (whether it freezes or
> not), then this metric may still be useful.
>
> By the way, disabling the various offload options didn't appear to make
> a different for the other person who has reported this problem.
>
> Thanks,
> NeilBrown

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-22  4:05                                 ` Mike Javorski
@ 2021-08-22 22:00                                   ` NeilBrown
  2021-08-26 19:34                                     ` Mike Javorski
  0 siblings, 1 reply; 47+ messages in thread
From: NeilBrown @ 2021-08-22 22:00 UTC (permalink / raw)
  To: Mike Javorski; +Cc: Chuck Lever III, Linux NFS Mailing List

On Sun, 22 Aug 2021, Mike Javorski wrote:
> 5.12.15-arch1:
> ===============
> 14723 137 0.00921938
> 
> 5.13.12-arch1: (no freezes)
> ===============
> 15333 206 0.013257
> 
> 5.13.12-arch1: (with freezes)
> ===============
> 9230 299 0.0313779
> 
> So a decent bump w/ 5.13, and even more with the freezes.

Thanks.  9->13 isn't as much as I was hoping for, but it might still be
significant. 

> This machine is an older Opteron server w/ nforce networking. Not sure
> how much offloading it actually does.

The tcpdump showed the server sending very large packets, which means
that it is at least doing TCP segmentation offload and checksum offload.
Those are very common these days.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-22 22:00                                   ` NeilBrown
@ 2021-08-26 19:34                                     ` Mike Javorski
  2021-08-26 21:44                                       ` NeilBrown
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Javorski @ 2021-08-26 19:34 UTC (permalink / raw)
  To: NeilBrown; +Cc: Chuck Lever III, Linux NFS Mailing List

I finally had the chance to test with the offloading disabled on the
server. Unfortunately, the issues remain, may even be more frequent,
but I can't rule out confirmation bias on that one.

Not sure where to go from here.

- mike

On Sun, Aug 22, 2021 at 3:00 PM NeilBrown <neilb@suse.de> wrote:
>
> On Sun, 22 Aug 2021, Mike Javorski wrote:
> > 5.12.15-arch1:
> > ===============
> > 14723 137 0.00921938
> >
> > 5.13.12-arch1: (no freezes)
> > ===============
> > 15333 206 0.013257
> >
> > 5.13.12-arch1: (with freezes)
> > ===============
> > 9230 299 0.0313779
> >
> > So a decent bump w/ 5.13, and even more with the freezes.
>
> Thanks.  9->13 isn't as much as I was hoping for, but it might still be
> significant.
>
> > This machine is an older Opteron server w/ nforce networking. Not sure
> > how much offloading it actually does.
>
> The tcpdump showed the server sending very large packets, which means
> that it is at least doing TCP segmentation offload and checksum offload.
> Those are very common these days.
>
> Thanks,
> NeilBrown

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-26 19:34                                     ` Mike Javorski
@ 2021-08-26 21:44                                       ` NeilBrown
  2021-08-27  0:07                                         ` Mike Javorski
  0 siblings, 1 reply; 47+ messages in thread
From: NeilBrown @ 2021-08-26 21:44 UTC (permalink / raw)
  To: Mike Javorski; +Cc: Chuck Lever III, Linux NFS Mailing List

On Fri, 27 Aug 2021, Mike Javorski wrote:
> I finally had the chance to test with the offloading disabled on the
> server. Unfortunately, the issues remain, may even be more frequent,
> but I can't rule out confirmation bias on that one.
> 
> Not sure where to go from here.

Have you tried NFSv3?  Or 4.0?
I am (or was) leaning away from thinking it was an nfsd problem, but we
cannot rule anything out until we know that is happening.

I plan to have another dig through the code changes today.

NeilBrown

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-26 21:44                                       ` NeilBrown
@ 2021-08-27  0:07                                         ` Mike Javorski
  2021-08-27  5:27                                           ` NeilBrown
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Javorski @ 2021-08-27  0:07 UTC (permalink / raw)
  To: NeilBrown; +Cc: Chuck Lever III, Linux NFS Mailing List

I just tried the same mount with 4 different nfsvers values: 3, 4.0, 4.1 and 4.2

At first I thought it might be "working" because I only got freezes
with 4.2 at first, but I went back and re-tested (to be sure) and got
freezes with all 4 versions. So the nfsvers setting doesn't seem to
have an impact. I did verify at each pass that the 'nfsvers=' value
was present and correct in the mount output.

FYI: another user posted on the archlinux reddit with a similar issue,
I suggested they try with a 5.12 kernel and that "solved" the issue
for them as well.

- mike

On Thu, Aug 26, 2021 at 2:44 PM NeilBrown <neilb@suse.de> wrote:
>
> On Fri, 27 Aug 2021, Mike Javorski wrote:
> > I finally had the chance to test with the offloading disabled on the
> > server. Unfortunately, the issues remain, may even be more frequent,
> > but I can't rule out confirmation bias on that one.
> >
> > Not sure where to go from here.
>
> Have you tried NFSv3?  Or 4.0?
> I am (or was) leaning away from thinking it was an nfsd problem, but we
> cannot rule anything out until we know that is happening.
>
> I plan to have another dig through the code changes today.
>
> NeilBrown

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-27  0:07                                         ` Mike Javorski
@ 2021-08-27  5:27                                           ` NeilBrown
  2021-08-27  6:11                                             ` Mike Javorski
  0 siblings, 1 reply; 47+ messages in thread
From: NeilBrown @ 2021-08-27  5:27 UTC (permalink / raw)
  To: Mike Javorski, Mel Gorman; +Cc: Chuck Lever III, Linux NFS Mailing List


[[Mel: if you read through to the end you'll see why I cc:ed you on this]]

On Fri, 27 Aug 2021, Mike Javorski wrote:
> I just tried the same mount with 4 different nfsvers values: 3, 4.0, 4.1 and 4.2
> 
> At first I thought it might be "working" because I only got freezes
> with 4.2 at first, but I went back and re-tested (to be sure) and got
> freezes with all 4 versions. So the nfsvers setting doesn't seem to
> have an impact. I did verify at each pass that the 'nfsvers=' value
> was present and correct in the mount output.
> 
> FYI: another user posted on the archlinux reddit with a similar issue,
> I suggested they try with a 5.12 kernel and that "solved" the issue
> for them as well.

well... I have good news and I have bad news.

First the good.
I reviewed all the symptoms again, and browsed the commits between
working and not-working, and the only pattern that made any sense was
that there was some issue with memory allocation.  The pauses - I
reasoned - were most likely pauses while allocating memory.

So instead of testing in a VM with 2G of RAM, I tried 512MB, and
suddenly the problem was trivial to reproduce.  Specifically I created a
(sparse) 1GB file on the test VM, exported it over NFS, and ran "md5sum"
on the file from an NFS client.  With 5.12 this reliably takes about 90 seconds
(as it does with 2G RAM).  On 5.13 and 512MB RAM, it usually takes a lot
longer.  5, 6, 7, 8 minutes (and assorted seconds).

The most questionable nfsd/ memory related patch in 5.13 is

 Commit f6e70aab9dfe ("SUNRPC: refresh rq_pages using a bulk page allocator")

I reverted that and now the problem is no longer there.  Gone.  90seconds
every time.

Now the bad news: I don't know why.  That patch should be a good patch,
with a small performance improvement, particularly at very high loads.
(maybe even a big improvement at very high loads).
The problem must be in alloc_pages_bulk_array(), which is a new
interface, so not possible to bisect.

So I might have a look at the code next week, but I've cc:ed Mel Gorman
in case he comes up with some ideas sooner.

For now, you can just revert that patch.

Thanks for all the testing you did!!  It certainly helped.

NeilBrown


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-27  5:27                                           ` NeilBrown
@ 2021-08-27  6:11                                             ` Mike Javorski
  2021-08-27  7:14                                               ` NeilBrown
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Javorski @ 2021-08-27  6:11 UTC (permalink / raw)
  To: NeilBrown; +Cc: Mel Gorman, Chuck Lever III, Linux NFS Mailing List

Neil:

I am actually compiling a 5.13.13 kernel with the patch that Chuck
suggested earlier right now. I am doing the full compile matching the
distro compile as I don't have a targeted kernel config ready to go
(it's been years), and I want to test like for like anyway. It should
be ready to install in the AM, my time, so I will test with that first
tomorrow and see if it resolves the issue, if not, I will report back
and then try your revert suggestion. On the issue of memory though, my
server has 16GB of memory (and free currently shows ~1GB unused, and
~11GB in buffers/caches), so this really shouldn't be an available
memory issue, but I guess we'll find out.

Thanks for the info.

- mike

On Thu, Aug 26, 2021 at 10:27 PM NeilBrown <neilb@suse.de> wrote:
>
>
> [[Mel: if you read through to the end you'll see why I cc:ed you on this]]
>
> On Fri, 27 Aug 2021, Mike Javorski wrote:
> > I just tried the same mount with 4 different nfsvers values: 3, 4.0, 4.1 and 4.2
> >
> > At first I thought it might be "working" because I only got freezes
> > with 4.2 at first, but I went back and re-tested (to be sure) and got
> > freezes with all 4 versions. So the nfsvers setting doesn't seem to
> > have an impact. I did verify at each pass that the 'nfsvers=' value
> > was present and correct in the mount output.
> >
> > FYI: another user posted on the archlinux reddit with a similar issue,
> > I suggested they try with a 5.12 kernel and that "solved" the issue
> > for them as well.
>
> well... I have good news and I have bad news.
>
> First the good.
> I reviewed all the symptoms again, and browsed the commits between
> working and not-working, and the only pattern that made any sense was
> that there was some issue with memory allocation.  The pauses - I
> reasoned - were most likely pauses while allocating memory.
>
> So instead of testing in a VM with 2G of RAM, I tried 512MB, and
> suddenly the problem was trivial to reproduce.  Specifically I created a
> (sparse) 1GB file on the test VM, exported it over NFS, and ran "md5sum"
> on the file from an NFS client.  With 5.12 this reliably takes about 90 seconds
> (as it does with 2G RAM).  On 5.13 and 512MB RAM, it usually takes a lot
> longer.  5, 6, 7, 8 minutes (and assorted seconds).
>
> The most questionable nfsd/ memory related patch in 5.13 is
>
>  Commit f6e70aab9dfe ("SUNRPC: refresh rq_pages using a bulk page allocator")
>
> I reverted that and now the problem is no longer there.  Gone.  90seconds
> every time.
>
> Now the bad news: I don't know why.  That patch should be a good patch,
> with a small performance improvement, particularly at very high loads.
> (maybe even a big improvement at very high loads).
> The problem must be in alloc_pages_bulk_array(), which is a new
> interface, so not possible to bisect.
>
> So I might have a look at the code next week, but I've cc:ed Mel Gorman
> in case he comes up with some ideas sooner.
>
> For now, you can just revert that patch.
>
> Thanks for all the testing you did!!  It certainly helped.
>
> NeilBrown
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-27  6:11                                             ` Mike Javorski
@ 2021-08-27  7:14                                               ` NeilBrown
  2021-08-27 14:13                                                 ` Chuck Lever III
  0 siblings, 1 reply; 47+ messages in thread
From: NeilBrown @ 2021-08-27  7:14 UTC (permalink / raw)
  To: Mike Javorski; +Cc: Mel Gorman, Chuck Lever III, Linux NFS Mailing List

On Fri, 27 Aug 2021, Mike Javorski wrote:
> Neil:
> 
> I am actually compiling a 5.13.13 kernel with the patch that Chuck
> suggested earlier right now. I am doing the full compile matching the
> distro compile as I don't have a targeted kernel config ready to go
> (it's been years), and I want to test like for like anyway. It should
> be ready to install in the AM, my time, so I will test with that first
> tomorrow and see if it resolves the issue, if not, I will report back
> and then try your revert suggestion. On the issue of memory though, my
> server has 16GB of memory (and free currently shows ~1GB unused, and
> ~11GB in buffers/caches), so this really shouldn't be an available
> memory issue, but I guess we'll find out.
> 
> Thanks for the info.

Take your time.

Just FYI, the fix Chuck identified doesn't match your symptoms.
That bug can only occur if
   /sys/module/sunrpc/parameters/svc_rpc_per_connection_limit
is non-zero.  When it does occur, the TCP connection completely
freezes - no further traffic.  IT won't even close.

I took a break and got some fresh air and now I understand the problem.
Please try the patch below, not the revert I suggested.
The pause can, I think, be caused by fragmented memory - not just low
memory.  If only 1/16 of your memory is free, it could easily be
fragmented.

Thanks,
NeilBrown

Subject: [PATCH] SUNRPC: don't pause on incomplete allocation

alloc_pages_bulk_array() attempts to allocate at least one page based on
the provided pages, and then opportunistically allocates more if that
can be done without dropping the spinlock.

So if it returns fewer than requested, that could just mean that it
needed to drop the lock.  In that case, try again immediately.

Only pause for a time if no progress could be made.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 net/sunrpc/svc_xprt.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index d66a8e44a1ae..99268dd95519 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -662,7 +662,7 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
 {
 	struct svc_serv *serv = rqstp->rq_server;
 	struct xdr_buf *arg = &rqstp->rq_arg;
-	unsigned long pages, filled;
+	unsigned long pages, filled, prev;
 
 	pages = (serv->sv_max_mesg + 2 * PAGE_SIZE) >> PAGE_SHIFT;
 	if (pages > RPCSVC_MAXPAGES) {
@@ -672,11 +672,14 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
 		pages = RPCSVC_MAXPAGES;
 	}
 
-	for (;;) {
+	for (prev = 0;; prev = filled) {
 		filled = alloc_pages_bulk_array(GFP_KERNEL, pages,
 						rqstp->rq_pages);
 		if (filled == pages)
 			break;
+		if (filled > prev)
+			/* Made progress, don't sleep yet */
+			continue;
 
 		set_current_state(TASK_INTERRUPTIBLE);
 		if (signalled() || kthread_should_stop()) {

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-27  7:14                                               ` NeilBrown
@ 2021-08-27 14:13                                                 ` Chuck Lever III
  2021-08-27 17:07                                                   ` Mike Javorski
  0 siblings, 1 reply; 47+ messages in thread
From: Chuck Lever III @ 2021-08-27 14:13 UTC (permalink / raw)
  To: Neil Brown; +Cc: Mike Javorski, Mel Gorman, Linux NFS Mailing List


> On Aug 27, 2021, at 3:14 AM, NeilBrown <neilb@suse.de> wrote:
> 
> Subject: [PATCH] SUNRPC: don't pause on incomplete allocation
> 
> alloc_pages_bulk_array() attempts to allocate at least one page based on
> the provided pages, and then opportunistically allocates more if that
> can be done without dropping the spinlock.
> 
> So if it returns fewer than requested, that could just mean that it
> needed to drop the lock.  In that case, try again immediately.
> 
> Only pause for a time if no progress could be made.

The case I was worried about was "no pages available on the
pcplist", in which case, alloc_pages_bulk_array() resorts
to calling __alloc_pages() and returns only one new page.

"No progess" would mean even __alloc_pages() failed.

So this patch would behave essentially like the
pre-alloc_pages_bulk_array() code: call alloc_page() for
each empty struct_page in the array without pausing. That
seems correct to me.


I would add

Fixes: f6e70aab9dfe ("SUNRPC: refresh rq_pages using a bulk page allocator")


> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
> net/sunrpc/svc_xprt.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index d66a8e44a1ae..99268dd95519 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -662,7 +662,7 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
> {
> 	struct svc_serv *serv = rqstp->rq_server;
> 	struct xdr_buf *arg = &rqstp->rq_arg;
> -	unsigned long pages, filled;
> +	unsigned long pages, filled, prev;
> 
> 	pages = (serv->sv_max_mesg + 2 * PAGE_SIZE) >> PAGE_SHIFT;
> 	if (pages > RPCSVC_MAXPAGES) {
> @@ -672,11 +672,14 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
> 		pages = RPCSVC_MAXPAGES;
> 	}
> 
> -	for (;;) {
> +	for (prev = 0;; prev = filled) {
> 		filled = alloc_pages_bulk_array(GFP_KERNEL, pages,
> 						rqstp->rq_pages);
> 		if (filled == pages)
> 			break;
> +		if (filled > prev)
> +			/* Made progress, don't sleep yet */
> +			continue;
> 
> 		set_current_state(TASK_INTERRUPTIBLE);
> 		if (signalled() || kthread_should_stop()) {

--
Chuck Lever




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-27 14:13                                                 ` Chuck Lever III
@ 2021-08-27 17:07                                                   ` Mike Javorski
  2021-08-27 22:00                                                     ` Mike Javorski
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Javorski @ 2021-08-27 17:07 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Neil Brown, Mel Gorman, Linux NFS Mailing List

Chuck:
I just booted a 5.13.13 kernel with your suggested patch. No freezes
on the first test, but that sometimes happens so I will let the server
settle some and try it again later in the day (which also would align
with Neil's comment on memory fragmentation being a contributor).

Neil:
I have started a compile with the above kernel + your patch to test
next unless you or Chuck determine that it isn't needed, or that I
should test both patches discreetly. As the above is already merged to
5.14 it seemed logical to just add your patch on top.

I will also try to set up a vm to test your md5sum scenario with the
various kernels since it's a much faster thing to test.

- mike

On Fri, Aug 27, 2021 at 7:13 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
> > On Aug 27, 2021, at 3:14 AM, NeilBrown <neilb@suse.de> wrote:
> >
> > Subject: [PATCH] SUNRPC: don't pause on incomplete allocation
> >
> > alloc_pages_bulk_array() attempts to allocate at least one page based on
> > the provided pages, and then opportunistically allocates more if that
> > can be done without dropping the spinlock.
> >
> > So if it returns fewer than requested, that could just mean that it
> > needed to drop the lock.  In that case, try again immediately.
> >
> > Only pause for a time if no progress could be made.
>
> The case I was worried about was "no pages available on the
> pcplist", in which case, alloc_pages_bulk_array() resorts
> to calling __alloc_pages() and returns only one new page.
>
> "No progess" would mean even __alloc_pages() failed.
>
> So this patch would behave essentially like the
> pre-alloc_pages_bulk_array() code: call alloc_page() for
> each empty struct_page in the array without pausing. That
> seems correct to me.
>
>
> I would add
>
> Fixes: f6e70aab9dfe ("SUNRPC: refresh rq_pages using a bulk page allocator")
>
>
> > Signed-off-by: NeilBrown <neilb@suse.de>
> > ---
> > net/sunrpc/svc_xprt.c | 7 +++++--
> > 1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> > index d66a8e44a1ae..99268dd95519 100644
> > --- a/net/sunrpc/svc_xprt.c
> > +++ b/net/sunrpc/svc_xprt.c
> > @@ -662,7 +662,7 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
> > {
> >       struct svc_serv *serv = rqstp->rq_server;
> >       struct xdr_buf *arg = &rqstp->rq_arg;
> > -     unsigned long pages, filled;
> > +     unsigned long pages, filled, prev;
> >
> >       pages = (serv->sv_max_mesg + 2 * PAGE_SIZE) >> PAGE_SHIFT;
> >       if (pages > RPCSVC_MAXPAGES) {
> > @@ -672,11 +672,14 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
> >               pages = RPCSVC_MAXPAGES;
> >       }
> >
> > -     for (;;) {
> > +     for (prev = 0;; prev = filled) {
> >               filled = alloc_pages_bulk_array(GFP_KERNEL, pages,
> >                                               rqstp->rq_pages);
> >               if (filled == pages)
> >                       break;
> > +             if (filled > prev)
> > +                     /* Made progress, don't sleep yet */
> > +                     continue;
> >
> >               set_current_state(TASK_INTERRUPTIBLE);
> >               if (signalled() || kthread_should_stop()) {
>
> --
> Chuck Lever
>
>
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-27 17:07                                                   ` Mike Javorski
@ 2021-08-27 22:00                                                     ` Mike Javorski
  2021-08-27 23:49                                                       ` Chuck Lever III
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Javorski @ 2021-08-27 22:00 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Neil Brown, Mel Gorman, Linux NFS Mailing List

OK, an update. Several hours of spaced out testing sessions and the
first patch seems to have resolved the issue. There may be a very tiny
bit of lag that still occurs when opening/processing new files, but so
far on this kernel I have not had any multi-second freezes. I am still
waiting on the kernel with Neil's patch to compile (compiling on this
underpowered server so it's taking several hours), but I think the
testing there will just be to see if I can show it works still, and
then to try and test in a memory constrained VM. To see if I can
recreate Neil's experiment. Likely will have to do this over the
weekend given the kernel compile delay + fiddling with a VM.

Chuck: I don't mean to overstep bounds, but is it possible to get that
patch pulled into 5.13 stable? That may help things for several people
while 5.14 goes through it's shakedown in archlinux prior to release.

- mike

On Fri, Aug 27, 2021 at 10:07 AM Mike Javorski <mike.javorski@gmail.com> wrote:
>
> Chuck:
> I just booted a 5.13.13 kernel with your suggested patch. No freezes
> on the first test, but that sometimes happens so I will let the server
> settle some and try it again later in the day (which also would align
> with Neil's comment on memory fragmentation being a contributor).
>
> Neil:
> I have started a compile with the above kernel + your patch to test
> next unless you or Chuck determine that it isn't needed, or that I
> should test both patches discreetly. As the above is already merged to
> 5.14 it seemed logical to just add your patch on top.
>
> I will also try to set up a vm to test your md5sum scenario with the
> various kernels since it's a much faster thing to test.
>
> - mike
>
> On Fri, Aug 27, 2021 at 7:13 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >
> >
> > > On Aug 27, 2021, at 3:14 AM, NeilBrown <neilb@suse.de> wrote:
> > >
> > > Subject: [PATCH] SUNRPC: don't pause on incomplete allocation
> > >
> > > alloc_pages_bulk_array() attempts to allocate at least one page based on
> > > the provided pages, and then opportunistically allocates more if that
> > > can be done without dropping the spinlock.
> > >
> > > So if it returns fewer than requested, that could just mean that it
> > > needed to drop the lock.  In that case, try again immediately.
> > >
> > > Only pause for a time if no progress could be made.
> >
> > The case I was worried about was "no pages available on the
> > pcplist", in which case, alloc_pages_bulk_array() resorts
> > to calling __alloc_pages() and returns only one new page.
> >
> > "No progess" would mean even __alloc_pages() failed.
> >
> > So this patch would behave essentially like the
> > pre-alloc_pages_bulk_array() code: call alloc_page() for
> > each empty struct_page in the array without pausing. That
> > seems correct to me.
> >
> >
> > I would add
> >
> > Fixes: f6e70aab9dfe ("SUNRPC: refresh rq_pages using a bulk page allocator")
> >
> >
> > > Signed-off-by: NeilBrown <neilb@suse.de>
> > > ---
> > > net/sunrpc/svc_xprt.c | 7 +++++--
> > > 1 file changed, 5 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> > > index d66a8e44a1ae..99268dd95519 100644
> > > --- a/net/sunrpc/svc_xprt.c
> > > +++ b/net/sunrpc/svc_xprt.c
> > > @@ -662,7 +662,7 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
> > > {
> > >       struct svc_serv *serv = rqstp->rq_server;
> > >       struct xdr_buf *arg = &rqstp->rq_arg;
> > > -     unsigned long pages, filled;
> > > +     unsigned long pages, filled, prev;
> > >
> > >       pages = (serv->sv_max_mesg + 2 * PAGE_SIZE) >> PAGE_SHIFT;
> > >       if (pages > RPCSVC_MAXPAGES) {
> > > @@ -672,11 +672,14 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
> > >               pages = RPCSVC_MAXPAGES;
> > >       }
> > >
> > > -     for (;;) {
> > > +     for (prev = 0;; prev = filled) {
> > >               filled = alloc_pages_bulk_array(GFP_KERNEL, pages,
> > >                                               rqstp->rq_pages);
> > >               if (filled == pages)
> > >                       break;
> > > +             if (filled > prev)
> > > +                     /* Made progress, don't sleep yet */
> > > +                     continue;
> > >
> > >               set_current_state(TASK_INTERRUPTIBLE);
> > >               if (signalled() || kthread_should_stop()) {
> >
> > --
> > Chuck Lever
> >
> >
> >

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-27 22:00                                                     ` Mike Javorski
@ 2021-08-27 23:49                                                       ` Chuck Lever III
  2021-08-28  3:22                                                         ` Mike Javorski
  0 siblings, 1 reply; 47+ messages in thread
From: Chuck Lever III @ 2021-08-27 23:49 UTC (permalink / raw)
  To: Mike Javorski; +Cc: Neil Brown, Mel Gorman, Linux NFS Mailing List


> On Aug 27, 2021, at 6:00 PM, Mike Javorski <mike.javorski@gmail.com> wrote:
> 
> OK, an update. Several hours of spaced out testing sessions and the
> first patch seems to have resolved the issue. There may be a very tiny
> bit of lag that still occurs when opening/processing new files, but so
> far on this kernel I have not had any multi-second freezes. I am still
> waiting on the kernel with Neil's patch to compile (compiling on this
> underpowered server so it's taking several hours), but I think the
> testing there will just be to see if I can show it works still, and
> then to try and test in a memory constrained VM. To see if I can
> recreate Neil's experiment. Likely will have to do this over the
> weekend given the kernel compile delay + fiddling with a VM.

Thanks for your testing!


> Chuck: I don't mean to overstep bounds, but is it possible to get that
> patch pulled into 5.13 stable? That may help things for several people
> while 5.14 goes through it's shakedown in archlinux prior to release.

The patch had a Fixes: tag, so it should get automatically backported
to every kernel that has the broken commit. If you don't see it in
a subsequent 5.13 stable kernel, you are free to ask the stable
maintainers to consider it.


> - mike
> 
> On Fri, Aug 27, 2021 at 10:07 AM Mike Javorski <mike.javorski@gmail.com> wrote:
>> 
>> Chuck:
>> I just booted a 5.13.13 kernel with your suggested patch. No freezes
>> on the first test, but that sometimes happens so I will let the server
>> settle some and try it again later in the day (which also would align
>> with Neil's comment on memory fragmentation being a contributor).
>> 
>> Neil:
>> I have started a compile with the above kernel + your patch to test
>> next unless you or Chuck determine that it isn't needed, or that I
>> should test both patches discreetly. As the above is already merged to
>> 5.14 it seemed logical to just add your patch on top.
>> 
>> I will also try to set up a vm to test your md5sum scenario with the
>> various kernels since it's a much faster thing to test.
>> 
>> - mike
>> 
>> On Fri, Aug 27, 2021 at 7:13 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>> 
>>> 
>>>> On Aug 27, 2021, at 3:14 AM, NeilBrown <neilb@suse.de> wrote:
>>>> 
>>>> Subject: [PATCH] SUNRPC: don't pause on incomplete allocation
>>>> 
>>>> alloc_pages_bulk_array() attempts to allocate at least one page based on
>>>> the provided pages, and then opportunistically allocates more if that
>>>> can be done without dropping the spinlock.
>>>> 
>>>> So if it returns fewer than requested, that could just mean that it
>>>> needed to drop the lock.  In that case, try again immediately.
>>>> 
>>>> Only pause for a time if no progress could be made.
>>> 
>>> The case I was worried about was "no pages available on the
>>> pcplist", in which case, alloc_pages_bulk_array() resorts
>>> to calling __alloc_pages() and returns only one new page.
>>> 
>>> "No progess" would mean even __alloc_pages() failed.
>>> 
>>> So this patch would behave essentially like the
>>> pre-alloc_pages_bulk_array() code: call alloc_page() for
>>> each empty struct_page in the array without pausing. That
>>> seems correct to me.
>>> 
>>> 
>>> I would add
>>> 
>>> Fixes: f6e70aab9dfe ("SUNRPC: refresh rq_pages using a bulk page allocator")
>>> 
>>> 
>>>> Signed-off-by: NeilBrown <neilb@suse.de>
>>>> ---
>>>> net/sunrpc/svc_xprt.c | 7 +++++--
>>>> 1 file changed, 5 insertions(+), 2 deletions(-)
>>>> 
>>>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
>>>> index d66a8e44a1ae..99268dd95519 100644
>>>> --- a/net/sunrpc/svc_xprt.c
>>>> +++ b/net/sunrpc/svc_xprt.c
>>>> @@ -662,7 +662,7 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
>>>> {
>>>>      struct svc_serv *serv = rqstp->rq_server;
>>>>      struct xdr_buf *arg = &rqstp->rq_arg;
>>>> -     unsigned long pages, filled;
>>>> +     unsigned long pages, filled, prev;
>>>> 
>>>>      pages = (serv->sv_max_mesg + 2 * PAGE_SIZE) >> PAGE_SHIFT;
>>>>      if (pages > RPCSVC_MAXPAGES) {
>>>> @@ -672,11 +672,14 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
>>>>              pages = RPCSVC_MAXPAGES;
>>>>      }
>>>> 
>>>> -     for (;;) {
>>>> +     for (prev = 0;; prev = filled) {
>>>>              filled = alloc_pages_bulk_array(GFP_KERNEL, pages,
>>>>                                              rqstp->rq_pages);
>>>>              if (filled == pages)
>>>>                      break;
>>>> +             if (filled > prev)
>>>> +                     /* Made progress, don't sleep yet */
>>>> +                     continue;
>>>> 
>>>>              set_current_state(TASK_INTERRUPTIBLE);
>>>>              if (signalled() || kthread_should_stop()) {
>>> 
>>> --
>>> Chuck Lever
>>> 
>>> 
>>> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-27 23:49                                                       ` Chuck Lever III
@ 2021-08-28  3:22                                                         ` Mike Javorski
  2021-08-28 18:23                                                           ` Chuck Lever III
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Javorski @ 2021-08-28  3:22 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Neil Brown, Mel Gorman, Linux NFS Mailing List

I had some time this evening (and the kernel finally compiled), and
wanted to get this tested.

The TL;DR:  Both patches are needed

Below are the test results from my replication of Neil's test. It is
readily apparent that both the 5.13.13 kernel AND the 5.13.13 kernel
with the 82011c80b3ec fix exhibit the randomness in read times that
were observed. The 5.13.13 kernel with both the 82011c80b3ec and
f6e70aab9dfe fixes brings the performance back in line with the
5.12.15 kernel which I tested as a baseline.

Please forgive the inconsistency in sample counts. This was running as
a while loop, and I just let it go long enough that the behavior was
consistent. Only change to the VM between tests was the different
kernel + a reboot. The testing PC had a consistent workload during the
entire set of tests.

Test 0: 5.13.10 (base kernel in VM image, just for kicks)
==================================================
 Samples 30
 Min 6.839
 Max 19.998
 Median 9.638
 75-P 10.898
 95-P 12.939
 99-P 18.005

Test 1: 5.12.15 (known good)
==================================================
 Samples 152
 Min 1.997
 Max 2.333
 Median 2.171
 75-P 2.230
 95-P 2.286
 99-P 2.312

Test 2: 5.13.13 (known bad)
==================================================
 Samples 42
 Min 3.587
 Max 15.803
 Median 6.039
 75-P 6.452
 95-P 10.293
 99-P 15.540

Test 3: 5.13.13 + 82011c80b3ec fix
==================================================
 Samples 44
 Min 4.309
 Max 37.040
 Median 6.615
 75-P 10.224
 95-P 19.516
 99-P 36.650

Test 4: 5.13.13 + 82011c80b3ec fix + f6e70aab9dfe fix
==================================================
 Samples 131
 Min 2.013
 Max 2.397
 Median 2.169
 75-P 2.211
 95-P 2.283
 99-P 2.348

I am going to run the kernel w/ both fixes over the weekend, but
things look good at this point.

- mike

On Fri, Aug 27, 2021 at 4:49 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
> > On Aug 27, 2021, at 6:00 PM, Mike Javorski <mike.javorski@gmail.com> wrote:
> >
> > OK, an update. Several hours of spaced out testing sessions and the
> > first patch seems to have resolved the issue. There may be a very tiny
> > bit of lag that still occurs when opening/processing new files, but so
> > far on this kernel I have not had any multi-second freezes. I am still
> > waiting on the kernel with Neil's patch to compile (compiling on this
> > underpowered server so it's taking several hours), but I think the
> > testing there will just be to see if I can show it works still, and
> > then to try and test in a memory constrained VM. To see if I can
> > recreate Neil's experiment. Likely will have to do this over the
> > weekend given the kernel compile delay + fiddling with a VM.
>
> Thanks for your testing!
>
>
> > Chuck: I don't mean to overstep bounds, but is it possible to get that
> > patch pulled into 5.13 stable? That may help things for several people
> > while 5.14 goes through it's shakedown in archlinux prior to release.
>
> The patch had a Fixes: tag, so it should get automatically backported
> to every kernel that has the broken commit. If you don't see it in
> a subsequent 5.13 stable kernel, you are free to ask the stable
> maintainers to consider it.
>
>
> > - mike
> >
> > On Fri, Aug 27, 2021 at 10:07 AM Mike Javorski <mike.javorski@gmail.com> wrote:
> >>
> >> Chuck:
> >> I just booted a 5.13.13 kernel with your suggested patch. No freezes
> >> on the first test, but that sometimes happens so I will let the server
> >> settle some and try it again later in the day (which also would align
> >> with Neil's comment on memory fragmentation being a contributor).
> >>
> >> Neil:
> >> I have started a compile with the above kernel + your patch to test
> >> next unless you or Chuck determine that it isn't needed, or that I
> >> should test both patches discreetly. As the above is already merged to
> >> 5.14 it seemed logical to just add your patch on top.
> >>
> >> I will also try to set up a vm to test your md5sum scenario with the
> >> various kernels since it's a much faster thing to test.
> >>
> >> - mike
> >>
> >> On Fri, Aug 27, 2021 at 7:13 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >>>
> >>>
> >>>> On Aug 27, 2021, at 3:14 AM, NeilBrown <neilb@suse.de> wrote:
> >>>>
> >>>> Subject: [PATCH] SUNRPC: don't pause on incomplete allocation
> >>>>
> >>>> alloc_pages_bulk_array() attempts to allocate at least one page based on
> >>>> the provided pages, and then opportunistically allocates more if that
> >>>> can be done without dropping the spinlock.
> >>>>
> >>>> So if it returns fewer than requested, that could just mean that it
> >>>> needed to drop the lock.  In that case, try again immediately.
> >>>>
> >>>> Only pause for a time if no progress could be made.
> >>>
> >>> The case I was worried about was "no pages available on the
> >>> pcplist", in which case, alloc_pages_bulk_array() resorts
> >>> to calling __alloc_pages() and returns only one new page.
> >>>
> >>> "No progess" would mean even __alloc_pages() failed.
> >>>
> >>> So this patch would behave essentially like the
> >>> pre-alloc_pages_bulk_array() code: call alloc_page() for
> >>> each empty struct_page in the array without pausing. That
> >>> seems correct to me.
> >>>
> >>>
> >>> I would add
> >>>
> >>> Fixes: f6e70aab9dfe ("SUNRPC: refresh rq_pages using a bulk page allocator")
> >>>
> >>>
> >>>> Signed-off-by: NeilBrown <neilb@suse.de>
> >>>> ---
> >>>> net/sunrpc/svc_xprt.c | 7 +++++--
> >>>> 1 file changed, 5 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> >>>> index d66a8e44a1ae..99268dd95519 100644
> >>>> --- a/net/sunrpc/svc_xprt.c
> >>>> +++ b/net/sunrpc/svc_xprt.c
> >>>> @@ -662,7 +662,7 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
> >>>> {
> >>>>      struct svc_serv *serv = rqstp->rq_server;
> >>>>      struct xdr_buf *arg = &rqstp->rq_arg;
> >>>> -     unsigned long pages, filled;
> >>>> +     unsigned long pages, filled, prev;
> >>>>
> >>>>      pages = (serv->sv_max_mesg + 2 * PAGE_SIZE) >> PAGE_SHIFT;
> >>>>      if (pages > RPCSVC_MAXPAGES) {
> >>>> @@ -672,11 +672,14 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
> >>>>              pages = RPCSVC_MAXPAGES;
> >>>>      }
> >>>>
> >>>> -     for (;;) {
> >>>> +     for (prev = 0;; prev = filled) {
> >>>>              filled = alloc_pages_bulk_array(GFP_KERNEL, pages,
> >>>>                                              rqstp->rq_pages);
> >>>>              if (filled == pages)
> >>>>                      break;
> >>>> +             if (filled > prev)
> >>>> +                     /* Made progress, don't sleep yet */
> >>>> +                     continue;
> >>>>
> >>>>              set_current_state(TASK_INTERRUPTIBLE);
> >>>>              if (signalled() || kthread_should_stop()) {
> >>>
> >>> --
> >>> Chuck Lever
> >>>
> >>>
> >>>
>
> --
> Chuck Lever
>
>
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-28  3:22                                                         ` Mike Javorski
@ 2021-08-28 18:23                                                           ` Chuck Lever III
  2021-08-29 22:28                                                             ` [PATCH] MM: clarify effort used in alloc_pages_bulk_*() NeilBrown
                                                                               ` (2 more replies)
  0 siblings, 3 replies; 47+ messages in thread
From: Chuck Lever III @ 2021-08-28 18:23 UTC (permalink / raw)
  To: Mike Javorski, Mel Gorman; +Cc: Neil Brown, Linux NFS Mailing List



> On Aug 27, 2021, at 11:22 PM, Mike Javorski <mike.javorski@gmail.com> wrote:
> 
> I had some time this evening (and the kernel finally compiled), and
> wanted to get this tested.
> 
> The TL;DR:  Both patches are needed
> 
> Below are the test results from my replication of Neil's test. It is
> readily apparent that both the 5.13.13 kernel AND the 5.13.13 kernel
> with the 82011c80b3ec fix exhibit the randomness in read times that
> were observed. The 5.13.13 kernel with both the 82011c80b3ec and
> f6e70aab9dfe fixes brings the performance back in line with the
> 5.12.15 kernel which I tested as a baseline.
> 
> Please forgive the inconsistency in sample counts. This was running as
> a while loop, and I just let it go long enough that the behavior was
> consistent. Only change to the VM between tests was the different
> kernel + a reboot. The testing PC had a consistent workload during the
> entire set of tests.
> 
> Test 0: 5.13.10 (base kernel in VM image, just for kicks)
> ==================================================
> Samples 30
> Min 6.839
> Max 19.998
> Median 9.638
> 75-P 10.898
> 95-P 12.939
> 99-P 18.005
> 
> Test 1: 5.12.15 (known good)
> ==================================================
> Samples 152
> Min 1.997
> Max 2.333
> Median 2.171
> 75-P 2.230
> 95-P 2.286
> 99-P 2.312
> 
> Test 2: 5.13.13 (known bad)
> ==================================================
> Samples 42
> Min 3.587
> Max 15.803
> Median 6.039
> 75-P 6.452
> 95-P 10.293
> 99-P 15.540
> 
> Test 3: 5.13.13 + 82011c80b3ec fix
> ==================================================
> Samples 44
> Min 4.309
> Max 37.040
> Median 6.615
> 75-P 10.224
> 95-P 19.516
> 99-P 36.650
> 
> Test 4: 5.13.13 + 82011c80b3ec fix + f6e70aab9dfe fix
> ==================================================
> Samples 131
> Min 2.013
> Max 2.397
> Median 2.169
> 75-P 2.211
> 95-P 2.283
> 99-P 2.348
> 
> I am going to run the kernel w/ both fixes over the weekend, but
> things look good at this point.
> 
> - mike

I've targeted Neil's fix for the first 5.15-rc NFSD pull request.
I'd like to have Mel's Reviewed-by or Acked-by, though.

I will add a Fixes: tag if Neil doesn't repost (no reason to at
this point) so the fix should get backported automatically to
recent stable kernels.


> On Fri, Aug 27, 2021 at 4:49 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>> 
>> 
>>> On Aug 27, 2021, at 6:00 PM, Mike Javorski <mike.javorski@gmail.com> wrote:
>>> 
>>> OK, an update. Several hours of spaced out testing sessions and the
>>> first patch seems to have resolved the issue. There may be a very tiny
>>> bit of lag that still occurs when opening/processing new files, but so
>>> far on this kernel I have not had any multi-second freezes. I am still
>>> waiting on the kernel with Neil's patch to compile (compiling on this
>>> underpowered server so it's taking several hours), but I think the
>>> testing there will just be to see if I can show it works still, and
>>> then to try and test in a memory constrained VM. To see if I can
>>> recreate Neil's experiment. Likely will have to do this over the
>>> weekend given the kernel compile delay + fiddling with a VM.
>> 
>> Thanks for your testing!
>> 
>> 
>>> Chuck: I don't mean to overstep bounds, but is it possible to get that
>>> patch pulled into 5.13 stable? That may help things for several people
>>> while 5.14 goes through it's shakedown in archlinux prior to release.
>> 
>> The patch had a Fixes: tag, so it should get automatically backported
>> to every kernel that has the broken commit. If you don't see it in
>> a subsequent 5.13 stable kernel, you are free to ask the stable
>> maintainers to consider it.
>> 
>> 
>>> - mike
>>> 
>>> On Fri, Aug 27, 2021 at 10:07 AM Mike Javorski <mike.javorski@gmail.com> wrote:
>>>> 
>>>> Chuck:
>>>> I just booted a 5.13.13 kernel with your suggested patch. No freezes
>>>> on the first test, but that sometimes happens so I will let the server
>>>> settle some and try it again later in the day (which also would align
>>>> with Neil's comment on memory fragmentation being a contributor).
>>>> 
>>>> Neil:
>>>> I have started a compile with the above kernel + your patch to test
>>>> next unless you or Chuck determine that it isn't needed, or that I
>>>> should test both patches discreetly. As the above is already merged to
>>>> 5.14 it seemed logical to just add your patch on top.
>>>> 
>>>> I will also try to set up a vm to test your md5sum scenario with the
>>>> various kernels since it's a much faster thing to test.
>>>> 
>>>> - mike
>>>> 
>>>> On Fri, Aug 27, 2021 at 7:13 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>>>> 
>>>>> 
>>>>>> On Aug 27, 2021, at 3:14 AM, NeilBrown <neilb@suse.de> wrote:
>>>>>> 
>>>>>> Subject: [PATCH] SUNRPC: don't pause on incomplete allocation
>>>>>> 
>>>>>> alloc_pages_bulk_array() attempts to allocate at least one page based on
>>>>>> the provided pages, and then opportunistically allocates more if that
>>>>>> can be done without dropping the spinlock.
>>>>>> 
>>>>>> So if it returns fewer than requested, that could just mean that it
>>>>>> needed to drop the lock.  In that case, try again immediately.
>>>>>> 
>>>>>> Only pause for a time if no progress could be made.
>>>>> 
>>>>> The case I was worried about was "no pages available on the
>>>>> pcplist", in which case, alloc_pages_bulk_array() resorts
>>>>> to calling __alloc_pages() and returns only one new page.
>>>>> 
>>>>> "No progess" would mean even __alloc_pages() failed.
>>>>> 
>>>>> So this patch would behave essentially like the
>>>>> pre-alloc_pages_bulk_array() code: call alloc_page() for
>>>>> each empty struct_page in the array without pausing. That
>>>>> seems correct to me.
>>>>> 
>>>>> 
>>>>> I would add
>>>>> 
>>>>> Fixes: f6e70aab9dfe ("SUNRPC: refresh rq_pages using a bulk page allocator")
>>>>> 
>>>>> 
>>>>>> Signed-off-by: NeilBrown <neilb@suse.de>
>>>>>> ---
>>>>>> net/sunrpc/svc_xprt.c | 7 +++++--
>>>>>> 1 file changed, 5 insertions(+), 2 deletions(-)
>>>>>> 
>>>>>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
>>>>>> index d66a8e44a1ae..99268dd95519 100644
>>>>>> --- a/net/sunrpc/svc_xprt.c
>>>>>> +++ b/net/sunrpc/svc_xprt.c
>>>>>> @@ -662,7 +662,7 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
>>>>>> {
>>>>>>     struct svc_serv *serv = rqstp->rq_server;
>>>>>>     struct xdr_buf *arg = &rqstp->rq_arg;
>>>>>> -     unsigned long pages, filled;
>>>>>> +     unsigned long pages, filled, prev;
>>>>>> 
>>>>>>     pages = (serv->sv_max_mesg + 2 * PAGE_SIZE) >> PAGE_SHIFT;
>>>>>>     if (pages > RPCSVC_MAXPAGES) {
>>>>>> @@ -672,11 +672,14 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
>>>>>>             pages = RPCSVC_MAXPAGES;
>>>>>>     }
>>>>>> 
>>>>>> -     for (;;) {
>>>>>> +     for (prev = 0;; prev = filled) {
>>>>>>             filled = alloc_pages_bulk_array(GFP_KERNEL, pages,
>>>>>>                                             rqstp->rq_pages);
>>>>>>             if (filled == pages)
>>>>>>                     break;
>>>>>> +             if (filled > prev)
>>>>>> +                     /* Made progress, don't sleep yet */
>>>>>> +                     continue;
>>>>>> 
>>>>>>             set_current_state(TASK_INTERRUPTIBLE);
>>>>>>             if (signalled() || kthread_should_stop()) {
>>>>> 
>>>>> --
>>>>> Chuck Lever
>>>>> 
>>>>> 
>>>>> 
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH] MM: clarify effort used in alloc_pages_bulk_*()
  2021-08-28 18:23                                                           ` Chuck Lever III
@ 2021-08-29 22:28                                                             ` NeilBrown
  2021-08-30  9:11                                                               ` Mel Gorman
  2021-08-29 22:36                                                             ` [PATCH] SUNRPC: don't pause on incomplete allocation NeilBrown
  2021-09-04 17:41                                                             ` NFS server regression in kernel 5.13 (tested w/ 5.13.9) Mike Javorski
  2 siblings, 1 reply; 47+ messages in thread
From: NeilBrown @ 2021-08-29 22:28 UTC (permalink / raw)
  To: Chuck Lever III, Mel Gorman
  Cc: Mike Javorski, Linux NFS Mailing List, linux-mm


The alloc_pages_bulk_*() interfaces do not make it clear what degree
of success can be expected.

The code appears to allocate at least one page with the same effort as
alloc_page() when used with the same GFP flags, and then to allocate any
more only opportunistically.  So a caller should not *expect* to get a
full allocation, but should not be *surprised* by one either.

Add text to the comment clarifying this.

Also fix a bug.  When alloc_pages_bulk_array() is called on an array
that is partially allocated, the level of effort to get a single page is
less than when the array was completely unallocated.  This behaviour is
inconsistent, but now fixed.

Fixes: 0f87d9d30f21 ("mm/page_alloc: add an array-based interface to the bulk page allocator")
Signed-off-by: NeilBrown <neilb@suse.de>
---
 mm/page_alloc.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index eeb3a9cb36bb..083316c45964 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5206,6 +5206,11 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
  * is the maximum number of pages that will be stored in the array.
  *
  * Returns the number of pages on the list or array.
+ *
+ * At least one page will be allocated if that is possible while
+ * remaining consistent with @gfp.  Extra pages up to the requested
+ * total will be allocated opportunistically when doing so is
+ * significantly cheaper than having the caller repeat the request.
  */
 unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 			nodemask_t *nodemask, int nr_pages,
@@ -5307,7 +5312,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 								pcp, pcp_list);
 		if (unlikely(!page)) {
 			/* Try and get at least one page */
-			if (!nr_populated)
+			if (!nr_account)
 				goto failed_irq;
 			break;
 		}
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH] SUNRPC: don't pause on incomplete allocation
  2021-08-28 18:23                                                           ` Chuck Lever III
  2021-08-29 22:28                                                             ` [PATCH] MM: clarify effort used in alloc_pages_bulk_*() NeilBrown
@ 2021-08-29 22:36                                                             ` NeilBrown
  2021-08-30  9:12                                                               ` Mel Gorman
  2021-08-30 20:46                                                               ` J. Bruce Fields
  2021-09-04 17:41                                                             ` NFS server regression in kernel 5.13 (tested w/ 5.13.9) Mike Javorski
  2 siblings, 2 replies; 47+ messages in thread
From: NeilBrown @ 2021-08-29 22:36 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Mike Javorski, Mel Gorman, Linux NFS Mailing List


alloc_pages_bulk_array() attempts to allocate at least one page based on
the provided pages, and then opportunistically allocates more if that
can be done without dropping the spinlock.

So if it returns fewer than requested, that could just mean that it
needed to drop the lock.  In that case, try again immediately.

Only pause for a time if no progress could be made.

Reported-and-tested-by: Mike Javorski <mike.javorski@gmail.com>
Reported-and-tested-by: Lothar Paltins <lopa@mailbox.org>
Fixes: f6e70aab9dfe ("SUNRPC: refresh rq_pages using a bulk page allocator")
Signed-off-by: NeilBrown <neilb@suse.de>
---

I decided I would resend, as I thought the for() loops could be clearer.
This patch should perform exactly the same as the previous one.


 net/sunrpc/svc_xprt.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index d66a8e44a1ae..e74d5cf3cbb4 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -662,7 +662,7 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
 {
 	struct svc_serv *serv = rqstp->rq_server;
 	struct xdr_buf *arg = &rqstp->rq_arg;
-	unsigned long pages, filled;
+	unsigned long pages, filled, ret;
 
 	pages = (serv->sv_max_mesg + 2 * PAGE_SIZE) >> PAGE_SHIFT;
 	if (pages > RPCSVC_MAXPAGES) {
@@ -672,11 +672,12 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
 		pages = RPCSVC_MAXPAGES;
 	}
 
-	for (;;) {
-		filled = alloc_pages_bulk_array(GFP_KERNEL, pages,
-						rqstp->rq_pages);
-		if (filled == pages)
-			break;
+	for (filled = 0; filled < pages; filled = ret) {
+		ret = alloc_pages_bulk_array(GFP_KERNEL, pages,
+					     rqstp->rq_pages);
+		if (ret > filled)
+			/* Made progress, don't sleep yet */
+			continue;
 
 		set_current_state(TASK_INTERRUPTIBLE);
 		if (signalled() || kthread_should_stop()) {
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH] MM: clarify effort used in alloc_pages_bulk_*()
  2021-08-29 22:28                                                             ` [PATCH] MM: clarify effort used in alloc_pages_bulk_*() NeilBrown
@ 2021-08-30  9:11                                                               ` Mel Gorman
  0 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2021-08-30  9:11 UTC (permalink / raw)
  To: NeilBrown
  Cc: Chuck Lever III, Mike Javorski, Linux NFS Mailing List, linux-mm

On Mon, Aug 30, 2021 at 08:28:15AM +1000, NeilBrown wrote:
> 
> The alloc_pages_bulk_*() interfaces do not make it clear what degree
> of success can be expected.
> 
> The code appears to allocate at least one page with the same effort as
> alloc_page() when used with the same GFP flags, and then to allocate any
> more only opportunistically.  So a caller should not *expect* to get a
> full allocation, but should not be *surprised* by one either.
> 
> Add text to the comment clarifying this.
> 
> Also fix a bug.  When alloc_pages_bulk_array() is called on an array
> that is partially allocated, the level of effort to get a single page is
> less than when the array was completely unallocated.  This behaviour is
> inconsistent, but now fixed.
> 
> Fixes: 0f87d9d30f21 ("mm/page_alloc: add an array-based interface to the bulk page allocator")
> Signed-off-by: NeilBrown <neilb@suse.de>

Acked-by: Mel Gorman <mgorman@suse.com>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] SUNRPC: don't pause on incomplete allocation
  2021-08-29 22:36                                                             ` [PATCH] SUNRPC: don't pause on incomplete allocation NeilBrown
@ 2021-08-30  9:12                                                               ` Mel Gorman
  2021-08-30 20:46                                                               ` J. Bruce Fields
  1 sibling, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2021-08-30  9:12 UTC (permalink / raw)
  To: NeilBrown; +Cc: Chuck Lever III, Mike Javorski, Linux NFS Mailing List

On Mon, Aug 30, 2021 at 08:36:34AM +1000, NeilBrown wrote:
> 
> alloc_pages_bulk_array() attempts to allocate at least one page based on
> the provided pages, and then opportunistically allocates more if that
> can be done without dropping the spinlock.
> 
> So if it returns fewer than requested, that could just mean that it
> needed to drop the lock.  In that case, try again immediately.
> 
> Only pause for a time if no progress could be made.
> 
> Reported-and-tested-by: Mike Javorski <mike.javorski@gmail.com>
> Reported-and-tested-by: Lothar Paltins <lopa@mailbox.org>
> Fixes: f6e70aab9dfe ("SUNRPC: refresh rq_pages using a bulk page allocator")
> Signed-off-by: NeilBrown <neilb@suse.de>

Acked-by: Mel Gorman <mgorman@suse.com>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] SUNRPC: don't pause on incomplete allocation
  2021-08-29 22:36                                                             ` [PATCH] SUNRPC: don't pause on incomplete allocation NeilBrown
  2021-08-30  9:12                                                               ` Mel Gorman
@ 2021-08-30 20:46                                                               ` J. Bruce Fields
  1 sibling, 0 replies; 47+ messages in thread
From: J. Bruce Fields @ 2021-08-30 20:46 UTC (permalink / raw)
  To: NeilBrown
  Cc: Chuck Lever III, Mike Javorski, Mel Gorman, Linux NFS Mailing List

For what it's worth, another positive report on this patch:
https://bugzilla.kernel.org/show_bug.cgi?id=213869#c10

--b.

On Mon, Aug 30, 2021 at 08:36:34AM +1000, NeilBrown wrote:
> 
> alloc_pages_bulk_array() attempts to allocate at least one page based on
> the provided pages, and then opportunistically allocates more if that
> can be done without dropping the spinlock.
> 
> So if it returns fewer than requested, that could just mean that it
> needed to drop the lock.  In that case, try again immediately.
> 
> Only pause for a time if no progress could be made.
> 
> Reported-and-tested-by: Mike Javorski <mike.javorski@gmail.com>
> Reported-and-tested-by: Lothar Paltins <lopa@mailbox.org>
> Fixes: f6e70aab9dfe ("SUNRPC: refresh rq_pages using a bulk page allocator")
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
> 
> I decided I would resend, as I thought the for() loops could be clearer.
> This patch should perform exactly the same as the previous one.
> 
> 
>  net/sunrpc/svc_xprt.c | 13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index d66a8e44a1ae..e74d5cf3cbb4 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -662,7 +662,7 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
>  {
>  	struct svc_serv *serv = rqstp->rq_server;
>  	struct xdr_buf *arg = &rqstp->rq_arg;
> -	unsigned long pages, filled;
> +	unsigned long pages, filled, ret;
>  
>  	pages = (serv->sv_max_mesg + 2 * PAGE_SIZE) >> PAGE_SHIFT;
>  	if (pages > RPCSVC_MAXPAGES) {
> @@ -672,11 +672,12 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
>  		pages = RPCSVC_MAXPAGES;
>  	}
>  
> -	for (;;) {
> -		filled = alloc_pages_bulk_array(GFP_KERNEL, pages,
> -						rqstp->rq_pages);
> -		if (filled == pages)
> -			break;
> +	for (filled = 0; filled < pages; filled = ret) {
> +		ret = alloc_pages_bulk_array(GFP_KERNEL, pages,
> +					     rqstp->rq_pages);
> +		if (ret > filled)
> +			/* Made progress, don't sleep yet */
> +			continue;
>  
>  		set_current_state(TASK_INTERRUPTIBLE);
>  		if (signalled() || kthread_should_stop()) {
> -- 
> 2.32.0

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-08-28 18:23                                                           ` Chuck Lever III
  2021-08-29 22:28                                                             ` [PATCH] MM: clarify effort used in alloc_pages_bulk_*() NeilBrown
  2021-08-29 22:36                                                             ` [PATCH] SUNRPC: don't pause on incomplete allocation NeilBrown
@ 2021-09-04 17:41                                                             ` Mike Javorski
  2021-09-05  2:02                                                               ` Chuck Lever III
  2 siblings, 1 reply; 47+ messages in thread
From: Mike Javorski @ 2021-09-04 17:41 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Mel Gorman, Neil Brown, Linux NFS Mailing List

Hi Chuck.

I noticed that you sent in the 5.15 pull request but Neil's fix
(e38b3f20059426a0adbde014ff71071739ab5226 in your tree) missed the
pull and thus the fix isn't going to be backported to 5.14 in the near
term. Is there another 5.15 pull planned in the not too distant future
so this will get flagged for back-porting, or do I need to reach out
to someone to expressly pull it into 5.14? If the latter, can you
point me in the right direction of who to ask (I assume it's someone
other than Greg KH)?

Thanks

- mike


On Sat, Aug 28, 2021 at 11:23 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
>
> > On Aug 27, 2021, at 11:22 PM, Mike Javorski <mike.javorski@gmail.com> wrote:
> >
> > I had some time this evening (and the kernel finally compiled), and
> > wanted to get this tested.
> >
> > The TL;DR:  Both patches are needed
> >
> > Below are the test results from my replication of Neil's test. It is
> > readily apparent that both the 5.13.13 kernel AND the 5.13.13 kernel
> > with the 82011c80b3ec fix exhibit the randomness in read times that
> > were observed. The 5.13.13 kernel with both the 82011c80b3ec and
> > f6e70aab9dfe fixes brings the performance back in line with the
> > 5.12.15 kernel which I tested as a baseline.
> >
> > Please forgive the inconsistency in sample counts. This was running as
> > a while loop, and I just let it go long enough that the behavior was
> > consistent. Only change to the VM between tests was the different
> > kernel + a reboot. The testing PC had a consistent workload during the
> > entire set of tests.
> >
> > Test 0: 5.13.10 (base kernel in VM image, just for kicks)
> > ==================================================
> > Samples 30
> > Min 6.839
> > Max 19.998
> > Median 9.638
> > 75-P 10.898
> > 95-P 12.939
> > 99-P 18.005
> >
> > Test 1: 5.12.15 (known good)
> > ==================================================
> > Samples 152
> > Min 1.997
> > Max 2.333
> > Median 2.171
> > 75-P 2.230
> > 95-P 2.286
> > 99-P 2.312
> >
> > Test 2: 5.13.13 (known bad)
> > ==================================================
> > Samples 42
> > Min 3.587
> > Max 15.803
> > Median 6.039
> > 75-P 6.452
> > 95-P 10.293
> > 99-P 15.540
> >
> > Test 3: 5.13.13 + 82011c80b3ec fix
> > ==================================================
> > Samples 44
> > Min 4.309
> > Max 37.040
> > Median 6.615
> > 75-P 10.224
> > 95-P 19.516
> > 99-P 36.650
> >
> > Test 4: 5.13.13 + 82011c80b3ec fix + f6e70aab9dfe fix
> > ==================================================
> > Samples 131
> > Min 2.013
> > Max 2.397
> > Median 2.169
> > 75-P 2.211
> > 95-P 2.283
> > 99-P 2.348
> >
> > I am going to run the kernel w/ both fixes over the weekend, but
> > things look good at this point.
> >
> > - mike
>
> I've targeted Neil's fix for the first 5.15-rc NFSD pull request.
> I'd like to have Mel's Reviewed-by or Acked-by, though.
>
> I will add a Fixes: tag if Neil doesn't repost (no reason to at
> this point) so the fix should get backported automatically to
> recent stable kernels.
>
>
> > On Fri, Aug 27, 2021 at 4:49 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >>
> >>
> >>> On Aug 27, 2021, at 6:00 PM, Mike Javorski <mike.javorski@gmail.com> wrote:
> >>>
> >>> OK, an update. Several hours of spaced out testing sessions and the
> >>> first patch seems to have resolved the issue. There may be a very tiny
> >>> bit of lag that still occurs when opening/processing new files, but so
> >>> far on this kernel I have not had any multi-second freezes. I am still
> >>> waiting on the kernel with Neil's patch to compile (compiling on this
> >>> underpowered server so it's taking several hours), but I think the
> >>> testing there will just be to see if I can show it works still, and
> >>> then to try and test in a memory constrained VM. To see if I can
> >>> recreate Neil's experiment. Likely will have to do this over the
> >>> weekend given the kernel compile delay + fiddling with a VM.
> >>
> >> Thanks for your testing!
> >>
> >>
> >>> Chuck: I don't mean to overstep bounds, but is it possible to get that
> >>> patch pulled into 5.13 stable? That may help things for several people
> >>> while 5.14 goes through it's shakedown in archlinux prior to release.
> >>
> >> The patch had a Fixes: tag, so it should get automatically backported
> >> to every kernel that has the broken commit. If you don't see it in
> >> a subsequent 5.13 stable kernel, you are free to ask the stable
> >> maintainers to consider it.
> >>
> >>
> >>> - mike
> >>>
> >>> On Fri, Aug 27, 2021 at 10:07 AM Mike Javorski <mike.javorski@gmail.com> wrote:
> >>>>
> >>>> Chuck:
> >>>> I just booted a 5.13.13 kernel with your suggested patch. No freezes
> >>>> on the first test, but that sometimes happens so I will let the server
> >>>> settle some and try it again later in the day (which also would align
> >>>> with Neil's comment on memory fragmentation being a contributor).
> >>>>
> >>>> Neil:
> >>>> I have started a compile with the above kernel + your patch to test
> >>>> next unless you or Chuck determine that it isn't needed, or that I
> >>>> should test both patches discreetly. As the above is already merged to
> >>>> 5.14 it seemed logical to just add your patch on top.
> >>>>
> >>>> I will also try to set up a vm to test your md5sum scenario with the
> >>>> various kernels since it's a much faster thing to test.
> >>>>
> >>>> - mike
> >>>>
> >>>> On Fri, Aug 27, 2021 at 7:13 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >>>>>
> >>>>>
> >>>>>> On Aug 27, 2021, at 3:14 AM, NeilBrown <neilb@suse.de> wrote:
> >>>>>>
> >>>>>> Subject: [PATCH] SUNRPC: don't pause on incomplete allocation
> >>>>>>
> >>>>>> alloc_pages_bulk_array() attempts to allocate at least one page based on
> >>>>>> the provided pages, and then opportunistically allocates more if that
> >>>>>> can be done without dropping the spinlock.
> >>>>>>
> >>>>>> So if it returns fewer than requested, that could just mean that it
> >>>>>> needed to drop the lock.  In that case, try again immediately.
> >>>>>>
> >>>>>> Only pause for a time if no progress could be made.
> >>>>>
> >>>>> The case I was worried about was "no pages available on the
> >>>>> pcplist", in which case, alloc_pages_bulk_array() resorts
> >>>>> to calling __alloc_pages() and returns only one new page.
> >>>>>
> >>>>> "No progess" would mean even __alloc_pages() failed.
> >>>>>
> >>>>> So this patch would behave essentially like the
> >>>>> pre-alloc_pages_bulk_array() code: call alloc_page() for
> >>>>> each empty struct_page in the array without pausing. That
> >>>>> seems correct to me.
> >>>>>
> >>>>>
> >>>>> I would add
> >>>>>
> >>>>> Fixes: f6e70aab9dfe ("SUNRPC: refresh rq_pages using a bulk page allocator")
> >>>>>
> >>>>>
> >>>>>> Signed-off-by: NeilBrown <neilb@suse.de>
> >>>>>> ---
> >>>>>> net/sunrpc/svc_xprt.c | 7 +++++--
> >>>>>> 1 file changed, 5 insertions(+), 2 deletions(-)
> >>>>>>
> >>>>>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> >>>>>> index d66a8e44a1ae..99268dd95519 100644
> >>>>>> --- a/net/sunrpc/svc_xprt.c
> >>>>>> +++ b/net/sunrpc/svc_xprt.c
> >>>>>> @@ -662,7 +662,7 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
> >>>>>> {
> >>>>>>     struct svc_serv *serv = rqstp->rq_server;
> >>>>>>     struct xdr_buf *arg = &rqstp->rq_arg;
> >>>>>> -     unsigned long pages, filled;
> >>>>>> +     unsigned long pages, filled, prev;
> >>>>>>
> >>>>>>     pages = (serv->sv_max_mesg + 2 * PAGE_SIZE) >> PAGE_SHIFT;
> >>>>>>     if (pages > RPCSVC_MAXPAGES) {
> >>>>>> @@ -672,11 +672,14 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
> >>>>>>             pages = RPCSVC_MAXPAGES;
> >>>>>>     }
> >>>>>>
> >>>>>> -     for (;;) {
> >>>>>> +     for (prev = 0;; prev = filled) {
> >>>>>>             filled = alloc_pages_bulk_array(GFP_KERNEL, pages,
> >>>>>>                                             rqstp->rq_pages);
> >>>>>>             if (filled == pages)
> >>>>>>                     break;
> >>>>>> +             if (filled > prev)
> >>>>>> +                     /* Made progress, don't sleep yet */
> >>>>>> +                     continue;
> >>>>>>
> >>>>>>             set_current_state(TASK_INTERRUPTIBLE);
> >>>>>>             if (signalled() || kthread_should_stop()) {
> >>>>>
> >>>>> --
> >>>>> Chuck Lever
> >>>>>
> >>>>>
> >>>>>
> >>
> >> --
> >> Chuck Lever
> >>
> >>
> >>
>
> --
> Chuck Lever
>
>
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-09-04 17:41                                                             ` NFS server regression in kernel 5.13 (tested w/ 5.13.9) Mike Javorski
@ 2021-09-05  2:02                                                               ` Chuck Lever III
  2021-09-16  2:45                                                                 ` Mike Javorski
  0 siblings, 1 reply; 47+ messages in thread
From: Chuck Lever III @ 2021-09-05  2:02 UTC (permalink / raw)
  To: Mike Javorski; +Cc: Mel Gorman, Neil Brown, Linux NFS Mailing List


> On Sep 4, 2021, at 1:41 PM, Mike Javorski <mike.javorski@gmail.com> wrote:
> 
> Hi Chuck.
> 
> I noticed that you sent in the 5.15 pull request but Neil's fix
> (e38b3f20059426a0adbde014ff71071739ab5226 in your tree) missed the
> pull and thus the fix isn't going to be backported to 5.14 in the near
> term. Is there another 5.15 pull planned in the not too distant future
> so this will get flagged for back-porting,

Yes. The final version of Neil’s patch was just a little late for the initial v5.15 NFSD pull request (IMO) so it’s queued for the next PR, probably this week.


> or do I need to reach out to someone to expressly pull it into 5.14? If the latter, can you
> point me in the right direction of who to ask (I assume it's someone
> other than Greg KH)?
> 
> Thanks
> 
> - mike
> 
> 
>> On Sat, Aug 28, 2021 at 11:23 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>> 
>> 
>> 
>>>> On Aug 27, 2021, at 11:22 PM, Mike Javorski <mike.javorski@gmail.com> wrote:
>>> 
>>> I had some time this evening (and the kernel finally compiled), and
>>> wanted to get this tested.
>>> 
>>> The TL;DR:  Both patches are needed
>>> 
>>> Below are the test results from my replication of Neil's test. It is
>>> readily apparent that both the 5.13.13 kernel AND the 5.13.13 kernel
>>> with the 82011c80b3ec fix exhibit the randomness in read times that
>>> were observed. The 5.13.13 kernel with both the 82011c80b3ec and
>>> f6e70aab9dfe fixes brings the performance back in line with the
>>> 5.12.15 kernel which I tested as a baseline.
>>> 
>>> Please forgive the inconsistency in sample counts. This was running as
>>> a while loop, and I just let it go long enough that the behavior was
>>> consistent. Only change to the VM between tests was the different
>>> kernel + a reboot. The testing PC had a consistent workload during the
>>> entire set of tests.
>>> 
>>> Test 0: 5.13.10 (base kernel in VM image, just for kicks)
>>> ==================================================
>>> Samples 30
>>> Min 6.839
>>> Max 19.998
>>> Median 9.638
>>> 75-P 10.898
>>> 95-P 12.939
>>> 99-P 18.005
>>> 
>>> Test 1: 5.12.15 (known good)
>>> ==================================================
>>> Samples 152
>>> Min 1.997
>>> Max 2.333
>>> Median 2.171
>>> 75-P 2.230
>>> 95-P 2.286
>>> 99-P 2.312
>>> 
>>> Test 2: 5.13.13 (known bad)
>>> ==================================================
>>> Samples 42
>>> Min 3.587
>>> Max 15.803
>>> Median 6.039
>>> 75-P 6.452
>>> 95-P 10.293
>>> 99-P 15.540
>>> 
>>> Test 3: 5.13.13 + 82011c80b3ec fix
>>> ==================================================
>>> Samples 44
>>> Min 4.309
>>> Max 37.040
>>> Median 6.615
>>> 75-P 10.224
>>> 95-P 19.516
>>> 99-P 36.650
>>> 
>>> Test 4: 5.13.13 + 82011c80b3ec fix + f6e70aab9dfe fix
>>> ==================================================
>>> Samples 131
>>> Min 2.013
>>> Max 2.397
>>> Median 2.169
>>> 75-P 2.211
>>> 95-P 2.283
>>> 99-P 2.348
>>> 
>>> I am going to run the kernel w/ both fixes over the weekend, but
>>> things look good at this point.
>>> 
>>> - mike
>> 
>> I've targeted Neil's fix for the first 5.15-rc NFSD pull request.
>> I'd like to have Mel's Reviewed-by or Acked-by, though.
>> 
>> I will add a Fixes: tag if Neil doesn't repost (no reason to at
>> this point) so the fix should get backported automatically to
>> recent stable kernels.
>> 
>> 
>>> On Fri, Aug 27, 2021 at 4:49 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>>> 
>>>> 
>>>>> On Aug 27, 2021, at 6:00 PM, Mike Javorski <mike.javorski@gmail.com> wrote:
>>>>> 
>>>>> OK, an update. Several hours of spaced out testing sessions and the
>>>>> first patch seems to have resolved the issue. There may be a very tiny
>>>>> bit of lag that still occurs when opening/processing new files, but so
>>>>> far on this kernel I have not had any multi-second freezes. I am still
>>>>> waiting on the kernel with Neil's patch to compile (compiling on this
>>>>> underpowered server so it's taking several hours), but I think the
>>>>> testing there will just be to see if I can show it works still, and
>>>>> then to try and test in a memory constrained VM. To see if I can
>>>>> recreate Neil's experiment. Likely will have to do this over the
>>>>> weekend given the kernel compile delay + fiddling with a VM.
>>>> 
>>>> Thanks for your testing!
>>>> 
>>>> 
>>>>> Chuck: I don't mean to overstep bounds, but is it possible to get that
>>>>> patch pulled into 5.13 stable? That may help things for several people
>>>>> while 5.14 goes through it's shakedown in archlinux prior to release.
>>>> 
>>>> The patch had a Fixes: tag, so it should get automatically backported
>>>> to every kernel that has the broken commit. If you don't see it in
>>>> a subsequent 5.13 stable kernel, you are free to ask the stable
>>>> maintainers to consider it.
>>>> 
>>>> 
>>>>> - mike
>>>>> 
>>>>> On Fri, Aug 27, 2021 at 10:07 AM Mike Javorski <mike.javorski@gmail.com> wrote:
>>>>>> 
>>>>>> Chuck:
>>>>>> I just booted a 5.13.13 kernel with your suggested patch. No freezes
>>>>>> on the first test, but that sometimes happens so I will let the server
>>>>>> settle some and try it again later in the day (which also would align
>>>>>> with Neil's comment on memory fragmentation being a contributor).
>>>>>> 
>>>>>> Neil:
>>>>>> I have started a compile with the above kernel + your patch to test
>>>>>> next unless you or Chuck determine that it isn't needed, or that I
>>>>>> should test both patches discreetly. As the above is already merged to
>>>>>> 5.14 it seemed logical to just add your patch on top.
>>>>>> 
>>>>>> I will also try to set up a vm to test your md5sum scenario with the
>>>>>> various kernels since it's a much faster thing to test.
>>>>>> 
>>>>>> - mike
>>>>>> 
>>>>>> On Fri, Aug 27, 2021 at 7:13 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>>> On Aug 27, 2021, at 3:14 AM, NeilBrown <neilb@suse.de> wrote:
>>>>>>>> 
>>>>>>>> Subject: [PATCH] SUNRPC: don't pause on incomplete allocation
>>>>>>>> 
>>>>>>>> alloc_pages_bulk_array() attempts to allocate at least one page based on
>>>>>>>> the provided pages, and then opportunistically allocates more if that
>>>>>>>> can be done without dropping the spinlock.
>>>>>>>> 
>>>>>>>> So if it returns fewer than requested, that could just mean that it
>>>>>>>> needed to drop the lock.  In that case, try again immediately.
>>>>>>>> 
>>>>>>>> Only pause for a time if no progress could be made.
>>>>>>> 
>>>>>>> The case I was worried about was "no pages available on the
>>>>>>> pcplist", in which case, alloc_pages_bulk_array() resorts
>>>>>>> to calling __alloc_pages() and returns only one new page.
>>>>>>> 
>>>>>>> "No progess" would mean even __alloc_pages() failed.
>>>>>>> 
>>>>>>> So this patch would behave essentially like the
>>>>>>> pre-alloc_pages_bulk_array() code: call alloc_page() for
>>>>>>> each empty struct_page in the array without pausing. That
>>>>>>> seems correct to me.
>>>>>>> 
>>>>>>> 
>>>>>>> I would add
>>>>>>> 
>>>>>>> Fixes: f6e70aab9dfe ("SUNRPC: refresh rq_pages using a bulk page allocator")
>>>>>>> 
>>>>>>> 
>>>>>>>> Signed-off-by: NeilBrown <neilb@suse.de>
>>>>>>>> ---
>>>>>>>> net/sunrpc/svc_xprt.c | 7 +++++--
>>>>>>>> 1 file changed, 5 insertions(+), 2 deletions(-)
>>>>>>>> 
>>>>>>>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
>>>>>>>> index d66a8e44a1ae..99268dd95519 100644
>>>>>>>> --- a/net/sunrpc/svc_xprt.c
>>>>>>>> +++ b/net/sunrpc/svc_xprt.c
>>>>>>>> @@ -662,7 +662,7 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
>>>>>>>> {
>>>>>>>>    struct svc_serv *serv = rqstp->rq_server;
>>>>>>>>    struct xdr_buf *arg = &rqstp->rq_arg;
>>>>>>>> -     unsigned long pages, filled;
>>>>>>>> +     unsigned long pages, filled, prev;
>>>>>>>> 
>>>>>>>>    pages = (serv->sv_max_mesg + 2 * PAGE_SIZE) >> PAGE_SHIFT;
>>>>>>>>    if (pages > RPCSVC_MAXPAGES) {
>>>>>>>> @@ -672,11 +672,14 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
>>>>>>>>            pages = RPCSVC_MAXPAGES;
>>>>>>>>    }
>>>>>>>> 
>>>>>>>> -     for (;;) {
>>>>>>>> +     for (prev = 0;; prev = filled) {
>>>>>>>>            filled = alloc_pages_bulk_array(GFP_KERNEL, pages,
>>>>>>>>                                            rqstp->rq_pages);
>>>>>>>>            if (filled == pages)
>>>>>>>>                    break;
>>>>>>>> +             if (filled > prev)
>>>>>>>> +                     /* Made progress, don't sleep yet */
>>>>>>>> +                     continue;
>>>>>>>> 
>>>>>>>>            set_current_state(TASK_INTERRUPTIBLE);
>>>>>>>>            if (signalled() || kthread_should_stop()) {
>>>>>>> 
>>>>>>> --
>>>>>>> Chuck Lever
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>> 
>>>> --
>>>> Chuck Lever
>>>> 
>>>> 
>>>> 
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-09-05  2:02                                                               ` Chuck Lever III
@ 2021-09-16  2:45                                                                 ` Mike Javorski
  2021-09-16 18:58                                                                   ` Chuck Lever III
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Javorski @ 2021-09-16  2:45 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Mel Gorman, Neil Brown, Linux NFS Mailing List

Chuck:

I see that the patch was merged to Linus' branch, but there have been
2 stable patch releases since and the patch hasn't been pulled in. You
mentioned I should reach out to the stable maintainers in this
instance, is the stable@vger.kernel.org list the appropriate place to
make such a request?

Thanks.

- mike

On Sat, Sep 4, 2021 at 7:02 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
> > On Sep 4, 2021, at 1:41 PM, Mike Javorski <mike.javorski@gmail.com> wrote:
> >
> > Hi Chuck.
> >
> > I noticed that you sent in the 5.15 pull request but Neil's fix
> > (e38b3f20059426a0adbde014ff71071739ab5226 in your tree) missed the
> > pull and thus the fix isn't going to be backported to 5.14 in the near
> > term. Is there another 5.15 pull planned in the not too distant future
> > so this will get flagged for back-porting,
>
> Yes. The final version of Neil’s patch was just a little late for the initial v5.15 NFSD pull request (IMO) so it’s queued for the next PR, probably this week.
>
>
> > or do I need to reach out to someone to expressly pull it into 5.14? If the latter, can you
> > point me in the right direction of who to ask (I assume it's someone
> > other than Greg KH)?
> >
> > Thanks
> >
> > - mike
> >
> >
> >> On Sat, Aug 28, 2021 at 11:23 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >>
> >>
> >>
> >>>> On Aug 27, 2021, at 11:22 PM, Mike Javorski <mike.javorski@gmail.com> wrote:
> >>>
> >>> I had some time this evening (and the kernel finally compiled), and
> >>> wanted to get this tested.
> >>>
> >>> The TL;DR:  Both patches are needed
> >>>
> >>> Below are the test results from my replication of Neil's test. It is
> >>> readily apparent that both the 5.13.13 kernel AND the 5.13.13 kernel
> >>> with the 82011c80b3ec fix exhibit the randomness in read times that
> >>> were observed. The 5.13.13 kernel with both the 82011c80b3ec and
> >>> f6e70aab9dfe fixes brings the performance back in line with the
> >>> 5.12.15 kernel which I tested as a baseline.
> >>>
> >>> Please forgive the inconsistency in sample counts. This was running as
> >>> a while loop, and I just let it go long enough that the behavior was
> >>> consistent. Only change to the VM between tests was the different
> >>> kernel + a reboot. The testing PC had a consistent workload during the
> >>> entire set of tests.
> >>>
> >>> Test 0: 5.13.10 (base kernel in VM image, just for kicks)
> >>> ==================================================
> >>> Samples 30
> >>> Min 6.839
> >>> Max 19.998
> >>> Median 9.638
> >>> 75-P 10.898
> >>> 95-P 12.939
> >>> 99-P 18.005
> >>>
> >>> Test 1: 5.12.15 (known good)
> >>> ==================================================
> >>> Samples 152
> >>> Min 1.997
> >>> Max 2.333
> >>> Median 2.171
> >>> 75-P 2.230
> >>> 95-P 2.286
> >>> 99-P 2.312
> >>>
> >>> Test 2: 5.13.13 (known bad)
> >>> ==================================================
> >>> Samples 42
> >>> Min 3.587
> >>> Max 15.803
> >>> Median 6.039
> >>> 75-P 6.452
> >>> 95-P 10.293
> >>> 99-P 15.540
> >>>
> >>> Test 3: 5.13.13 + 82011c80b3ec fix
> >>> ==================================================
> >>> Samples 44
> >>> Min 4.309
> >>> Max 37.040
> >>> Median 6.615
> >>> 75-P 10.224
> >>> 95-P 19.516
> >>> 99-P 36.650
> >>>
> >>> Test 4: 5.13.13 + 82011c80b3ec fix + f6e70aab9dfe fix
> >>> ==================================================
> >>> Samples 131
> >>> Min 2.013
> >>> Max 2.397
> >>> Median 2.169
> >>> 75-P 2.211
> >>> 95-P 2.283
> >>> 99-P 2.348
> >>>
> >>> I am going to run the kernel w/ both fixes over the weekend, but
> >>> things look good at this point.
> >>>
> >>> - mike
> >>
> >> I've targeted Neil's fix for the first 5.15-rc NFSD pull request.
> >> I'd like to have Mel's Reviewed-by or Acked-by, though.
> >>
> >> I will add a Fixes: tag if Neil doesn't repost (no reason to at
> >> this point) so the fix should get backported automatically to
> >> recent stable kernels.
> >>
> >>
> >>> On Fri, Aug 27, 2021 at 4:49 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >>>>
> >>>>
> >>>>> On Aug 27, 2021, at 6:00 PM, Mike Javorski <mike.javorski@gmail.com> wrote:
> >>>>>
> >>>>> OK, an update. Several hours of spaced out testing sessions and the
> >>>>> first patch seems to have resolved the issue. There may be a very tiny
> >>>>> bit of lag that still occurs when opening/processing new files, but so
> >>>>> far on this kernel I have not had any multi-second freezes. I am still
> >>>>> waiting on the kernel with Neil's patch to compile (compiling on this
> >>>>> underpowered server so it's taking several hours), but I think the
> >>>>> testing there will just be to see if I can show it works still, and
> >>>>> then to try and test in a memory constrained VM. To see if I can
> >>>>> recreate Neil's experiment. Likely will have to do this over the
> >>>>> weekend given the kernel compile delay + fiddling with a VM.
> >>>>
> >>>> Thanks for your testing!
> >>>>
> >>>>
> >>>>> Chuck: I don't mean to overstep bounds, but is it possible to get that
> >>>>> patch pulled into 5.13 stable? That may help things for several people
> >>>>> while 5.14 goes through it's shakedown in archlinux prior to release.
> >>>>
> >>>> The patch had a Fixes: tag, so it should get automatically backported
> >>>> to every kernel that has the broken commit. If you don't see it in
> >>>> a subsequent 5.13 stable kernel, you are free to ask the stable
> >>>> maintainers to consider it.
> >>>>
> >>>>
> >>>>> - mike
> >>>>>
> >>>>> On Fri, Aug 27, 2021 at 10:07 AM Mike Javorski <mike.javorski@gmail.com> wrote:
> >>>>>>
> >>>>>> Chuck:
> >>>>>> I just booted a 5.13.13 kernel with your suggested patch. No freezes
> >>>>>> on the first test, but that sometimes happens so I will let the server
> >>>>>> settle some and try it again later in the day (which also would align
> >>>>>> with Neil's comment on memory fragmentation being a contributor).
> >>>>>>
> >>>>>> Neil:
> >>>>>> I have started a compile with the above kernel + your patch to test
> >>>>>> next unless you or Chuck determine that it isn't needed, or that I
> >>>>>> should test both patches discreetly. As the above is already merged to
> >>>>>> 5.14 it seemed logical to just add your patch on top.
> >>>>>>
> >>>>>> I will also try to set up a vm to test your md5sum scenario with the
> >>>>>> various kernels since it's a much faster thing to test.
> >>>>>>
> >>>>>> - mike
> >>>>>>
> >>>>>> On Fri, Aug 27, 2021 at 7:13 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>> On Aug 27, 2021, at 3:14 AM, NeilBrown <neilb@suse.de> wrote:
> >>>>>>>>
> >>>>>>>> Subject: [PATCH] SUNRPC: don't pause on incomplete allocation
> >>>>>>>>
> >>>>>>>> alloc_pages_bulk_array() attempts to allocate at least one page based on
> >>>>>>>> the provided pages, and then opportunistically allocates more if that
> >>>>>>>> can be done without dropping the spinlock.
> >>>>>>>>
> >>>>>>>> So if it returns fewer than requested, that could just mean that it
> >>>>>>>> needed to drop the lock.  In that case, try again immediately.
> >>>>>>>>
> >>>>>>>> Only pause for a time if no progress could be made.
> >>>>>>>
> >>>>>>> The case I was worried about was "no pages available on the
> >>>>>>> pcplist", in which case, alloc_pages_bulk_array() resorts
> >>>>>>> to calling __alloc_pages() and returns only one new page.
> >>>>>>>
> >>>>>>> "No progess" would mean even __alloc_pages() failed.
> >>>>>>>
> >>>>>>> So this patch would behave essentially like the
> >>>>>>> pre-alloc_pages_bulk_array() code: call alloc_page() for
> >>>>>>> each empty struct_page in the array without pausing. That
> >>>>>>> seems correct to me.
> >>>>>>>
> >>>>>>>
> >>>>>>> I would add
> >>>>>>>
> >>>>>>> Fixes: f6e70aab9dfe ("SUNRPC: refresh rq_pages using a bulk page allocator")
> >>>>>>>
> >>>>>>>
> >>>>>>>> Signed-off-by: NeilBrown <neilb@suse.de>
> >>>>>>>> ---
> >>>>>>>> net/sunrpc/svc_xprt.c | 7 +++++--
> >>>>>>>> 1 file changed, 5 insertions(+), 2 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> >>>>>>>> index d66a8e44a1ae..99268dd95519 100644
> >>>>>>>> --- a/net/sunrpc/svc_xprt.c
> >>>>>>>> +++ b/net/sunrpc/svc_xprt.c
> >>>>>>>> @@ -662,7 +662,7 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
> >>>>>>>> {
> >>>>>>>>    struct svc_serv *serv = rqstp->rq_server;
> >>>>>>>>    struct xdr_buf *arg = &rqstp->rq_arg;
> >>>>>>>> -     unsigned long pages, filled;
> >>>>>>>> +     unsigned long pages, filled, prev;
> >>>>>>>>
> >>>>>>>>    pages = (serv->sv_max_mesg + 2 * PAGE_SIZE) >> PAGE_SHIFT;
> >>>>>>>>    if (pages > RPCSVC_MAXPAGES) {
> >>>>>>>> @@ -672,11 +672,14 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
> >>>>>>>>            pages = RPCSVC_MAXPAGES;
> >>>>>>>>    }
> >>>>>>>>
> >>>>>>>> -     for (;;) {
> >>>>>>>> +     for (prev = 0;; prev = filled) {
> >>>>>>>>            filled = alloc_pages_bulk_array(GFP_KERNEL, pages,
> >>>>>>>>                                            rqstp->rq_pages);
> >>>>>>>>            if (filled == pages)
> >>>>>>>>                    break;
> >>>>>>>> +             if (filled > prev)
> >>>>>>>> +                     /* Made progress, don't sleep yet */
> >>>>>>>> +                     continue;
> >>>>>>>>
> >>>>>>>>            set_current_state(TASK_INTERRUPTIBLE);
> >>>>>>>>            if (signalled() || kthread_should_stop()) {
> >>>>>>>
> >>>>>>> --
> >>>>>>> Chuck Lever
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>
> >>>> --
> >>>> Chuck Lever
> >>>>
> >>>>
> >>>>
> >>
> >> --
> >> Chuck Lever
> >>
> >>
> >>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-09-16  2:45                                                                 ` Mike Javorski
@ 2021-09-16 18:58                                                                   ` Chuck Lever III
  2021-09-16 19:21                                                                     ` Mike Javorski
  0 siblings, 1 reply; 47+ messages in thread
From: Chuck Lever III @ 2021-09-16 18:58 UTC (permalink / raw)
  To: Mike Javorski; +Cc: Mel Gorman, Neil Brown, Linux NFS Mailing List



> On Sep 15, 2021, at 10:45 PM, Mike Javorski <mike.javorski@gmail.com> wrote:
> 
> Chuck:
> 
> I see that the patch was merged to Linus' branch, but there have been
> 2 stable patch releases since and the patch hasn't been pulled in.

The v5.15 merge window just closed on Monday. I'm sure there is a
long queue of fixes to be pulled into stable. I'd give it more time.


> You
> mentioned I should reach out to the stable maintainers in this
> instance, is the stable@vger.kernel.org list the appropriate place to
> make such a request?

That could be correct, but the automated process has worked for me
long enough that I'm no longer sure how to make such a request.

--
Chuck Lever




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-09-16 18:58                                                                   ` Chuck Lever III
@ 2021-09-16 19:21                                                                     ` Mike Javorski
  2021-09-17 14:41                                                                       ` J. Bruce Fields
  0 siblings, 1 reply; 47+ messages in thread
From: Mike Javorski @ 2021-09-16 19:21 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Mel Gorman, Neil Brown, Linux NFS Mailing List

Thanks for the insight Chuck. I will give it a bit more time as you
suggest; I will ask if the arch devs can pull in the fix to their
kernel in the meantime.
I didn't see anyone making "backport fixes" requests in the stable
list archive recently, so it doesn't seem to be the norm, but I don't
see any other methods.

- mike

On Thu, Sep 16, 2021 at 11:58 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
>
> > On Sep 15, 2021, at 10:45 PM, Mike Javorski <mike.javorski@gmail.com> wrote:
> >
> > Chuck:
> >
> > I see that the patch was merged to Linus' branch, but there have been
> > 2 stable patch releases since and the patch hasn't been pulled in.
>
> The v5.15 merge window just closed on Monday. I'm sure there is a
> long queue of fixes to be pulled into stable. I'd give it more time.
>
>
> > You
> > mentioned I should reach out to the stable maintainers in this
> > instance, is the stable@vger.kernel.org list the appropriate place to
> > make such a request?
>
> That could be correct, but the automated process has worked for me
> long enough that I'm no longer sure how to make such a request.
>
> --
> Chuck Lever
>
>
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
  2021-09-16 19:21                                                                     ` Mike Javorski
@ 2021-09-17 14:41                                                                       ` J. Bruce Fields
  0 siblings, 0 replies; 47+ messages in thread
From: J. Bruce Fields @ 2021-09-17 14:41 UTC (permalink / raw)
  To: Mike Javorski
  Cc: Chuck Lever III, Mel Gorman, Neil Brown, Linux NFS Mailing List

On Thu, Sep 16, 2021 at 12:21:28PM -0700, Mike Javorski wrote:
> Thanks for the insight Chuck. I will give it a bit more time as you
> suggest; I will ask if the arch devs can pull in the fix to their
> kernel in the meantime.
> I didn't see anyone making "backport fixes" requests in the stable
> list archive recently, so it doesn't seem to be the norm, but I don't
> see any other methods.

Yes, I think you just send them mail.

But like Chuck says they're probably just working through a bit of a
backlog.

--b.

> 
> - mike
> 
> On Thu, Sep 16, 2021 at 11:58 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >
> >
> >
> > > On Sep 15, 2021, at 10:45 PM, Mike Javorski <mike.javorski@gmail.com> wrote:
> > >
> > > Chuck:
> > >
> > > I see that the patch was merged to Linus' branch, but there have been
> > > 2 stable patch releases since and the patch hasn't been pulled in.
> >
> > The v5.15 merge window just closed on Monday. I'm sure there is a
> > long queue of fixes to be pulled into stable. I'd give it more time.
> >
> >
> > > You
> > > mentioned I should reach out to the stable maintainers in this
> > > instance, is the stable@vger.kernel.org list the appropriate place to
> > > make such a request?
> >
> > That could be correct, but the automated process has worked for me
> > long enough that I'm no longer sure how to make such a request.
> >
> > --
> > Chuck Lever
> >
> >
> >

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2021-09-17 14:41 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-08 22:37 NFS server regression in kernel 5.13 (tested w/ 5.13.9) Mike Javorski
2021-08-08 22:47 ` Chuck Lever III
2021-08-08 23:23   ` Mike Javorski
2021-08-09  0:01 ` NeilBrown
2021-08-09  0:28   ` Mike Javorski
2021-08-10  0:50     ` Mike Javorski
2021-08-10  1:28       ` NeilBrown
2021-08-10 11:54         ` Daire Byrne
2021-08-13  1:51         ` Mike Javorski
2021-08-13  2:39           ` NeilBrown
2021-08-13  2:53             ` Mike Javorski
2021-08-15  1:23               ` Mike Javorski
2021-08-16  1:20                 ` NeilBrown
2021-08-16 13:21                   ` Chuck Lever III
2021-08-16 16:25                     ` Mike Javorski
2021-08-16 23:01                       ` NeilBrown
2021-08-20  0:31                         ` NeilBrown
2021-08-20  0:52                           ` Mike Javorski
2021-08-22  0:17                             ` Mike Javorski
2021-08-22  3:41                               ` NeilBrown
2021-08-22  4:05                                 ` Mike Javorski
2021-08-22 22:00                                   ` NeilBrown
2021-08-26 19:34                                     ` Mike Javorski
2021-08-26 21:44                                       ` NeilBrown
2021-08-27  0:07                                         ` Mike Javorski
2021-08-27  5:27                                           ` NeilBrown
2021-08-27  6:11                                             ` Mike Javorski
2021-08-27  7:14                                               ` NeilBrown
2021-08-27 14:13                                                 ` Chuck Lever III
2021-08-27 17:07                                                   ` Mike Javorski
2021-08-27 22:00                                                     ` Mike Javorski
2021-08-27 23:49                                                       ` Chuck Lever III
2021-08-28  3:22                                                         ` Mike Javorski
2021-08-28 18:23                                                           ` Chuck Lever III
2021-08-29 22:28                                                             ` [PATCH] MM: clarify effort used in alloc_pages_bulk_*() NeilBrown
2021-08-30  9:11                                                               ` Mel Gorman
2021-08-29 22:36                                                             ` [PATCH] SUNRPC: don't pause on incomplete allocation NeilBrown
2021-08-30  9:12                                                               ` Mel Gorman
2021-08-30 20:46                                                               ` J. Bruce Fields
2021-09-04 17:41                                                             ` NFS server regression in kernel 5.13 (tested w/ 5.13.9) Mike Javorski
2021-09-05  2:02                                                               ` Chuck Lever III
2021-09-16  2:45                                                                 ` Mike Javorski
2021-09-16 18:58                                                                   ` Chuck Lever III
2021-09-16 19:21                                                                     ` Mike Javorski
2021-09-17 14:41                                                                       ` J. Bruce Fields
2021-08-16 16:09                   ` Mike Javorski
2021-08-16 23:04                     ` NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.