linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, NFS List <linux-nfs@vger.kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Subject: [PATCH] MM: increase safety margin provided by PF_LESS_THROTTLE
Date: Wed, 18 May 2016 13:41:20 +1000	[thread overview]
Message-ID: <87futgowwv.fsf@notabene.neil.brown.name> (raw)

[-- Attachment #1: Type: text/plain, Size: 3038 bytes --]


When nfsd is exporting a filesystem over NFS which is then NFS-mounted
on the local machine there is a risk of deadlock.  This happens when
there are lots of dirty pages in the NFS filesystem and they cause
NFSD to be throttled, either in throttle_vm_writeout() or in
balance_dirty_pages().

To avoid this problem the PF_LESS_THROTTLE flag is set for NFSD
threads and it provides a 25% increase to the limits that affect NFSD.
Any process writing to an NFS filesystem will be throttled well
before the number of dirty NFS pages reaches the limit imposed on
NFSD, so NFSD will not deadlock on pages that it needs to write out.
At least it shouldn't.

All processes are allowed a small excess margin to avoid performing
too many calculations: ratelimit_pages.

ratelimit_pages is set so that if a thread on every CPU uses the
entire margin, the total will only go 3% over the limit, and this is
much less than the 25% bonus that PF_LESS_THROTTLE provides, so this
margin shouldn't be a problem.  But it is.

The "total memory" that these 3% and 25% are calculated against are not
really total memory but are "global_dirtyable_memory()" which doesn't
include anonymous memory, just free memory and page-cache memory.

The "ratelimit_pages" number is based on whatever the
global_dirtyable_memory was on the last CPU hot-plug, which might not
be what you expect, but is probably close to the total freeable memory.

The throttle threshold uses the global_dirtable_memory at the moment
when the throttling happens, which could be much less than at the last
CPU hotplug.  So if lots of anonymous memory has been allocated, thus
pushing out lots of page-cache pages, then NFSD might end up being
throttled due to dirty NFS pages because the "25%" bonus it gets is
calculated against a rather small amount of dirtyable memory, while
the "3%" margin that other processes are allowed to dirty without
penalty is calculated against a much larger number.

To remove this possibility of deadlock we need to make sure that the
margin granted to PF_LESS_THROTTLE exceeds that rate-limit margin.
Simply adding ratelimit_pages isn't enough as that should be
multiplied by the number of cpus.

So add "global_wb_domain.dirty_limit / 32" as that more accurately
reflects the current total over-shoot margin.  This ensures that the
number of dirty NFS pages never gets so high that nfsd will be
throttled waiting for them to be written.

Signed-off-by: NeilBrown <neilb@suse.com>

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index bc5149d5ec38..bbdcd7ccef57 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -407,8 +407,8 @@ static void domain_dirty_limits(struct dirty_throttle_control *dtc)
 		bg_thresh = thresh / 2;
 	tsk = current;
 	if (tsk->flags & PF_LESS_THROTTLE || rt_task(tsk)) {
-		bg_thresh += bg_thresh / 4;
-		thresh += thresh / 4;
+		bg_thresh += bg_thresh / 4 + global_wb_domain.dirty_limit / 32;
+		thresh += thresh / 4 + global_wb_domain.dirty_limit / 32;
 	}
 	dtc->thresh = thresh;
 	dtc->bg_thresh = bg_thresh;

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

                 reply	other threads:[~2016-05-18  3:41 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87futgowwv.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).