All of lore.kernel.org
 help / color / mirror / Atom feed
From: "NeilBrown" <neilb@suse.de>
To: "Mike Javorski" <mike.javorski@gmail.com>, Mel Gorman <mgorman@suse.com>
Cc: "Chuck Lever III" <chuck.lever@oracle.com>,
	"Linux NFS Mailing List" <linux-nfs@vger.kernel.org>
Subject: Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
Date: Fri, 27 Aug 2021 15:27:09 +1000	[thread overview]
Message-ID: <163004202961.7591.12633163545286005205@noble.neil.brown.name> (raw)
In-Reply-To: <CAOv1SKC+3LXhM+L9MwU2D03bpeof55-g+i=r3SWEjVWcPVCi8Q@mail.gmail.com>


[[Mel: if you read through to the end you'll see why I cc:ed you on this]]

On Fri, 27 Aug 2021, Mike Javorski wrote:
> I just tried the same mount with 4 different nfsvers values: 3, 4.0, 4.1 and 4.2
> 
> At first I thought it might be "working" because I only got freezes
> with 4.2 at first, but I went back and re-tested (to be sure) and got
> freezes with all 4 versions. So the nfsvers setting doesn't seem to
> have an impact. I did verify at each pass that the 'nfsvers=' value
> was present and correct in the mount output.
> 
> FYI: another user posted on the archlinux reddit with a similar issue,
> I suggested they try with a 5.12 kernel and that "solved" the issue
> for them as well.

well... I have good news and I have bad news.

First the good.
I reviewed all the symptoms again, and browsed the commits between
working and not-working, and the only pattern that made any sense was
that there was some issue with memory allocation.  The pauses - I
reasoned - were most likely pauses while allocating memory.

So instead of testing in a VM with 2G of RAM, I tried 512MB, and
suddenly the problem was trivial to reproduce.  Specifically I created a
(sparse) 1GB file on the test VM, exported it over NFS, and ran "md5sum"
on the file from an NFS client.  With 5.12 this reliably takes about 90 seconds
(as it does with 2G RAM).  On 5.13 and 512MB RAM, it usually takes a lot
longer.  5, 6, 7, 8 minutes (and assorted seconds).

The most questionable nfsd/ memory related patch in 5.13 is

 Commit f6e70aab9dfe ("SUNRPC: refresh rq_pages using a bulk page allocator")

I reverted that and now the problem is no longer there.  Gone.  90seconds
every time.

Now the bad news: I don't know why.  That patch should be a good patch,
with a small performance improvement, particularly at very high loads.
(maybe even a big improvement at very high loads).
The problem must be in alloc_pages_bulk_array(), which is a new
interface, so not possible to bisect.

So I might have a look at the code next week, but I've cc:ed Mel Gorman
in case he comes up with some ideas sooner.

For now, you can just revert that patch.

Thanks for all the testing you did!!  It certainly helped.

NeilBrown


  reply	other threads:[~2021-08-27  5:27 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-08 22:37 NFS server regression in kernel 5.13 (tested w/ 5.13.9) Mike Javorski
2021-08-08 22:47 ` Chuck Lever III
2021-08-08 23:23   ` Mike Javorski
2021-08-09  0:01 ` NeilBrown
2021-08-09  0:28   ` Mike Javorski
2021-08-10  0:50     ` Mike Javorski
2021-08-10  1:28       ` NeilBrown
2021-08-10 11:54         ` Daire Byrne
2021-08-13  1:51         ` Mike Javorski
2021-08-13  2:39           ` NeilBrown
2021-08-13  2:53             ` Mike Javorski
2021-08-15  1:23               ` Mike Javorski
2021-08-16  1:20                 ` NeilBrown
2021-08-16 13:21                   ` Chuck Lever III
2021-08-16 16:25                     ` Mike Javorski
2021-08-16 23:01                       ` NeilBrown
2021-08-20  0:31                         ` NeilBrown
2021-08-20  0:52                           ` Mike Javorski
2021-08-22  0:17                             ` Mike Javorski
2021-08-22  3:41                               ` NeilBrown
2021-08-22  4:05                                 ` Mike Javorski
2021-08-22 22:00                                   ` NeilBrown
2021-08-26 19:34                                     ` Mike Javorski
2021-08-26 21:44                                       ` NeilBrown
2021-08-27  0:07                                         ` Mike Javorski
2021-08-27  5:27                                           ` NeilBrown [this message]
2021-08-27  6:11                                             ` Mike Javorski
2021-08-27  7:14                                               ` NeilBrown
2021-08-27 14:13                                                 ` Chuck Lever III
2021-08-27 17:07                                                   ` Mike Javorski
2021-08-27 22:00                                                     ` Mike Javorski
2021-08-27 23:49                                                       ` Chuck Lever III
2021-08-28  3:22                                                         ` Mike Javorski
2021-08-28 18:23                                                           ` Chuck Lever III
2021-08-29 22:28                                                             ` [PATCH] MM: clarify effort used in alloc_pages_bulk_*() NeilBrown
2021-08-30  9:11                                                               ` Mel Gorman
2021-08-29 22:36                                                             ` [PATCH] SUNRPC: don't pause on incomplete allocation NeilBrown
2021-08-30  9:12                                                               ` Mel Gorman
2021-08-30 20:46                                                               ` J. Bruce Fields
2021-09-04 17:41                                                             ` NFS server regression in kernel 5.13 (tested w/ 5.13.9) Mike Javorski
2021-09-05  2:02                                                               ` Chuck Lever III
2021-09-16  2:45                                                                 ` Mike Javorski
2021-09-16 18:58                                                                   ` Chuck Lever III
2021-09-16 19:21                                                                     ` Mike Javorski
2021-09-17 14:41                                                                       ` J. Bruce Fields
2021-08-16 16:09                   ` Mike Javorski
2021-08-16 23:04                     ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=163004202961.7591.12633163545286005205@noble.neil.brown.name \
    --to=neilb@suse.de \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mgorman@suse.com \
    --cc=mike.javorski@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.