All of lore.kernel.org
 help / color / mirror / Atom feed
From: "NeilBrown" <neilb@suse.de>
To: "Mike Javorski" <mike.javorski@gmail.com>
Cc: "Mel Gorman" <mgorman@suse.com>,
	"Chuck Lever III" <chuck.lever@oracle.com>,
	"Linux NFS Mailing List" <linux-nfs@vger.kernel.org>
Subject: Re: NFS server regression in kernel 5.13 (tested w/ 5.13.9)
Date: Fri, 27 Aug 2021 17:14:45 +1000	[thread overview]
Message-ID: <163004848514.7591.2757618782251492498@noble.neil.brown.name> (raw)
In-Reply-To: <CAOv1SKDTcg5WDp5zf3ZGL0enJ7K693W-9TMYKcrgweyzp6Qjhg@mail.gmail.com>

On Fri, 27 Aug 2021, Mike Javorski wrote:
> Neil:
> 
> I am actually compiling a 5.13.13 kernel with the patch that Chuck
> suggested earlier right now. I am doing the full compile matching the
> distro compile as I don't have a targeted kernel config ready to go
> (it's been years), and I want to test like for like anyway. It should
> be ready to install in the AM, my time, so I will test with that first
> tomorrow and see if it resolves the issue, if not, I will report back
> and then try your revert suggestion. On the issue of memory though, my
> server has 16GB of memory (and free currently shows ~1GB unused, and
> ~11GB in buffers/caches), so this really shouldn't be an available
> memory issue, but I guess we'll find out.
> 
> Thanks for the info.

Take your time.

Just FYI, the fix Chuck identified doesn't match your symptoms.
That bug can only occur if
   /sys/module/sunrpc/parameters/svc_rpc_per_connection_limit
is non-zero.  When it does occur, the TCP connection completely
freezes - no further traffic.  IT won't even close.

I took a break and got some fresh air and now I understand the problem.
Please try the patch below, not the revert I suggested.
The pause can, I think, be caused by fragmented memory - not just low
memory.  If only 1/16 of your memory is free, it could easily be
fragmented.

Thanks,
NeilBrown

Subject: [PATCH] SUNRPC: don't pause on incomplete allocation

alloc_pages_bulk_array() attempts to allocate at least one page based on
the provided pages, and then opportunistically allocates more if that
can be done without dropping the spinlock.

So if it returns fewer than requested, that could just mean that it
needed to drop the lock.  In that case, try again immediately.

Only pause for a time if no progress could be made.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 net/sunrpc/svc_xprt.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index d66a8e44a1ae..99268dd95519 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -662,7 +662,7 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
 {
 	struct svc_serv *serv = rqstp->rq_server;
 	struct xdr_buf *arg = &rqstp->rq_arg;
-	unsigned long pages, filled;
+	unsigned long pages, filled, prev;
 
 	pages = (serv->sv_max_mesg + 2 * PAGE_SIZE) >> PAGE_SHIFT;
 	if (pages > RPCSVC_MAXPAGES) {
@@ -672,11 +672,14 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
 		pages = RPCSVC_MAXPAGES;
 	}
 
-	for (;;) {
+	for (prev = 0;; prev = filled) {
 		filled = alloc_pages_bulk_array(GFP_KERNEL, pages,
 						rqstp->rq_pages);
 		if (filled == pages)
 			break;
+		if (filled > prev)
+			/* Made progress, don't sleep yet */
+			continue;
 
 		set_current_state(TASK_INTERRUPTIBLE);
 		if (signalled() || kthread_should_stop()) {

  reply	other threads:[~2021-08-27  7:14 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-08 22:37 NFS server regression in kernel 5.13 (tested w/ 5.13.9) Mike Javorski
2021-08-08 22:47 ` Chuck Lever III
2021-08-08 23:23   ` Mike Javorski
2021-08-09  0:01 ` NeilBrown
2021-08-09  0:28   ` Mike Javorski
2021-08-10  0:50     ` Mike Javorski
2021-08-10  1:28       ` NeilBrown
2021-08-10 11:54         ` Daire Byrne
2021-08-13  1:51         ` Mike Javorski
2021-08-13  2:39           ` NeilBrown
2021-08-13  2:53             ` Mike Javorski
2021-08-15  1:23               ` Mike Javorski
2021-08-16  1:20                 ` NeilBrown
2021-08-16 13:21                   ` Chuck Lever III
2021-08-16 16:25                     ` Mike Javorski
2021-08-16 23:01                       ` NeilBrown
2021-08-20  0:31                         ` NeilBrown
2021-08-20  0:52                           ` Mike Javorski
2021-08-22  0:17                             ` Mike Javorski
2021-08-22  3:41                               ` NeilBrown
2021-08-22  4:05                                 ` Mike Javorski
2021-08-22 22:00                                   ` NeilBrown
2021-08-26 19:34                                     ` Mike Javorski
2021-08-26 21:44                                       ` NeilBrown
2021-08-27  0:07                                         ` Mike Javorski
2021-08-27  5:27                                           ` NeilBrown
2021-08-27  6:11                                             ` Mike Javorski
2021-08-27  7:14                                               ` NeilBrown [this message]
2021-08-27 14:13                                                 ` Chuck Lever III
2021-08-27 17:07                                                   ` Mike Javorski
2021-08-27 22:00                                                     ` Mike Javorski
2021-08-27 23:49                                                       ` Chuck Lever III
2021-08-28  3:22                                                         ` Mike Javorski
2021-08-28 18:23                                                           ` Chuck Lever III
2021-08-29 22:28                                                             ` [PATCH] MM: clarify effort used in alloc_pages_bulk_*() NeilBrown
2021-08-30  9:11                                                               ` Mel Gorman
2021-08-29 22:36                                                             ` [PATCH] SUNRPC: don't pause on incomplete allocation NeilBrown
2021-08-30  9:12                                                               ` Mel Gorman
2021-08-30 20:46                                                               ` J. Bruce Fields
2021-09-04 17:41                                                             ` NFS server regression in kernel 5.13 (tested w/ 5.13.9) Mike Javorski
2021-09-05  2:02                                                               ` Chuck Lever III
2021-09-16  2:45                                                                 ` Mike Javorski
2021-09-16 18:58                                                                   ` Chuck Lever III
2021-09-16 19:21                                                                     ` Mike Javorski
2021-09-17 14:41                                                                       ` J. Bruce Fields
2021-08-16 16:09                   ` Mike Javorski
2021-08-16 23:04                     ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=163004848514.7591.2757618782251492498@noble.neil.brown.name \
    --to=neilb@suse.de \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mgorman@suse.com \
    --cc=mike.javorski@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.