linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, xfs@oss.sgi.com
Subject: [PATCH 03/12] vmscan: reduce wind up shrinker->nr when shrinker can't do work
Date: Thu,  2 Jun 2011 17:00:58 +1000	[thread overview]
Message-ID: <1306998067-27659-4-git-send-email-david@fromorbit.com> (raw)
In-Reply-To: <1306998067-27659-1-git-send-email-david@fromorbit.com>

From: Dave Chinner <dchinner@redhat.com>

When a shrinker returns -1 to shrink_slab() to indicate it cannot do
any work given the current memory reclaim requirements, it adds the
entire total_scan count to shrinker->nr. The idea ehind this is that
whenteh shrinker is next called and can do work, it will do the work
of the previously aborted shrinker call as well.

However, if a filesystem is doing lots of allocation with GFP_NOFS
set, then we get many, many more aborts from the shrinkers than we
do successful calls. The result is that shrinker->nr winds up to
it's maximum permissible value (twice the current cache size) and
then when the next shrinker call that can do work is issued, it
has enough scan count built up to free the entire cache twice over.

This manifests itself in the cache going from full to empty in a
matter of seconds, even when only a small part of the cache is
needed to be emptied to free sufficient memory.

Under metadata intensive workloads on ext4 and XFS, I'm seeing the
VFS caches increase memory consumption up to 75% of memory (no page
cache pressure) over a period of 30-60s, and then the shrinker
empties them down to zero in the space of 2-3s. This cycle repeats
over and over again, with the shrinker completely trashing the іnode
and dentry caches every minute or so the workload continues.

This behaviour was made obvious by the shrink_slab tracepoints added
earlier in the series, and made worse by the patch that corrected
the concurrent accounting of shrinker->nr.

To avoid this problem, stop repeated small increments of the total
scan value from winding shrinker->nr up to a value that can cause
the entire cache to be freed. We still need to allow it to wind up,
so use the delta as the "large scan" threshold check - if the delta
is more than a quarter of the entire cache size, then it is a large
scan and allowed to cause lots of windup because we are clearly
needing to free lots of memory.

If it isn't a large scan then limit the total scan to half the size
of the cache so that windup never increases to consume the whole
cache. Reducing the total scan limit further does not allow enough
wind-up to maintain the current levels of performance, whilst a
higher threshold does not prevent the windup from freeing the entire
cache under sustained workloads.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 mm/vmscan.c |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index dce2767..3688f47 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -277,6 +277,20 @@ unsigned long shrink_slab(struct shrink_control *shrink,
 		}
 
 		/*
+		 * Avoid excessive windup on fielsystem shrinkers due to large
+		 * numbers of GFP_NOFS allocations causing the shrinkers to
+		 * return -1 all the time. This results in a large nr being
+		 * built up so when a shrink that can do some work comes along
+		 * it empties the entire cache due to nr >>> max_pass.  This is
+		 * bad for sustaining a working set in memory.
+		 *
+		 * Hence only allow nr to go large when a large delta is
+		 * calculated.
+		 */
+		if (delta < max_pass / 4)
+			total_scan = min(total_scan, max_pass / 2);
+
+		/*
 		 * Avoid risking looping forever due to too large nr value:
 		 * never try to free more than twice the estimate number of
 		 * freeable entries.
-- 
1.7.5.1


  parent reply	other threads:[~2011-06-02  7:01 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-02  7:00 [PATCH 0/12] Per superblock cache reclaim Dave Chinner
2011-06-02  7:00 ` [PATCH 01/12] vmscan: add shrink_slab tracepoints Dave Chinner
2011-06-20  0:44   ` KOSAKI Motohiro
2011-06-20  0:53     ` Dave Chinner
2011-06-02  7:00 ` [PATCH 02/12] vmscan: shrinker->nr updates race and go wrong Dave Chinner
2011-06-20  0:46   ` KOSAKI Motohiro
2011-06-20  1:25     ` Dave Chinner
2011-06-20  4:30       ` KOSAKI Motohiro
2011-06-02  7:00 ` Dave Chinner [this message]
2011-06-20  0:51   ` [PATCH 03/12] vmscan: reduce wind up shrinker->nr when shrinker can't do work KOSAKI Motohiro
2011-06-21  5:09     ` Dave Chinner
2011-06-21  5:27       ` KOSAKI Motohiro
2011-06-02  7:00 ` [PATCH 04/12] vmscan: add customisable shrinker batch size Dave Chinner
2011-06-02  7:01 ` [PATCH 05/12] inode: convert inode_stat.nr_unused to per-cpu counters Dave Chinner
2011-06-02  7:01 ` [PATCH 06/12] inode: Make unused inode LRU per superblock Dave Chinner
2011-06-04  0:25   ` Al Viro
2011-06-04  1:40     ` Dave Chinner
2011-06-02  7:01 ` [PATCH 07/12] inode: move to per-sb LRU locks Dave Chinner
2011-06-02  7:01 ` [PATCH 08/12] superblock: introduce per-sb cache shrinker infrastructure Dave Chinner
2011-06-04  0:42   ` Al Viro
2011-06-04  1:52     ` Dave Chinner
2011-06-04 14:08       ` Christoph Hellwig
2011-06-04 14:19         ` Al Viro
2011-06-04 14:24           ` Al Viro
2011-06-02  7:01 ` [PATCH 09/12] inode: remove iprune_sem Dave Chinner
2011-06-02  7:01 ` [PATCH 10/12] superblock: add filesystem shrinker operations Dave Chinner
2011-06-02  7:01 ` [PATCH 11/12] vfs: increase shrinker batch size Dave Chinner
2011-06-02  9:30   ` Nicolas Kaiser
2011-06-02  7:01 ` [PATCH 12/12] xfs: make use of new shrinker callout for the inode cache Dave Chinner
2011-06-16 11:33 ` [PATCH 0/12] Per superblock cache reclaim Christoph Hellwig
2011-06-17  3:35   ` KOSAKI Motohiro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1306998067-27659-4-git-send-email-david@fromorbit.com \
    --to=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).