All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Johannes Weiner <hannes@cmpxchg.org>,
	Tejun Heo <tj@kernel.org>, Rik van Riel <riel@redhat.com>,
	Mel Gorman <mgorman@suse.de>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Michal Hocko <mhocko@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 3.10 11/79] mm/page-writeback.c: fix dirty_balance_reserve subtraction from dirtyable memory
Date: Tue, 11 Feb 2014 11:05:15 -0800	[thread overview]
Message-ID: <20140211184721.257646791@linuxfoundation.org> (raw)
In-Reply-To: <20140211184720.928667275@linuxfoundation.org>

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Johannes Weiner <hannes@cmpxchg.org>

commit a804552b9a15c931cfc2a92a2e0aed1add8b580a upstream.

Tejun reported stuttering and latency spikes on a system where random
tasks would enter direct reclaim and get stuck on dirty pages.  Around
50% of memory was occupied by tmpfs backed by an SSD, and another disk
(rotating) was reading and writing at max speed to shrink a partition.

: The problem was pretty ridiculous.  It's a 8gig machine w/ one ssd and 10k
: rpm harddrive and I could reliably reproduce constant stuttering every
: several seconds for as long as buffered IO was going on on the hard drive
: either with tmpfs occupying somewhere above 4gig or a test program which
: allocates about the same amount of anon memory.  Although swap usage was
: zero, turning off swap also made the problem go away too.
:
: The trigger conditions seem quite plausible - high anon memory usage w/
: heavy buffered IO and swap configured - and it's highly likely that this
: is happening in the wild too.  (this can happen with copying large files
: to usb sticks too, right?)

This patch (of 2):

The dirty_balance_reserve is an approximation of the fraction of free
pages that the page allocator does not make available for page cache
allocations.  As a result, it has to be taken into account when
calculating the amount of "dirtyable memory", the baseline to which
dirty_background_ratio and dirty_ratio are applied.

However, currently the reserve is subtracted from the sum of free and
reclaimable pages, which is non-sensical and leads to erroneous results
when the system is dominated by unreclaimable pages and the
dirty_balance_reserve is bigger than free+reclaimable.  In that case, at
least the already allocated cache should be considered dirtyable.

Fix the calculation by subtracting the reserve from the amount of free
pages, then adding the reclaimable pages on top.

[akpm@linux-foundation.org: fix CONFIG_HIGHMEM build]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Tejun Heo <tj@kernel.org>
Tested-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/page-writeback.c |   55 ++++++++++++++++++++++------------------------------
 1 file changed, 24 insertions(+), 31 deletions(-)

--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -188,6 +188,25 @@ static unsigned long writeout_period_tim
  * global dirtyable memory first.
  */
 
+/**
+ * zone_dirtyable_memory - number of dirtyable pages in a zone
+ * @zone: the zone
+ *
+ * Returns the zone's number of pages potentially available for dirty
+ * page cache.  This is the base value for the per-zone dirty limits.
+ */
+static unsigned long zone_dirtyable_memory(struct zone *zone)
+{
+	unsigned long nr_pages;
+
+	nr_pages = zone_page_state(zone, NR_FREE_PAGES);
+	nr_pages -= min(nr_pages, zone->dirty_balance_reserve);
+
+	nr_pages += zone_reclaimable_pages(zone);
+
+	return nr_pages;
+}
+
 static unsigned long highmem_dirtyable_memory(unsigned long total)
 {
 #ifdef CONFIG_HIGHMEM
@@ -195,11 +214,9 @@ static unsigned long highmem_dirtyable_m
 	unsigned long x = 0;
 
 	for_each_node_state(node, N_HIGH_MEMORY) {
-		struct zone *z =
-			&NODE_DATA(node)->node_zones[ZONE_HIGHMEM];
+		struct zone *z = &NODE_DATA(node)->node_zones[ZONE_HIGHMEM];
 
-		x += zone_page_state(z, NR_FREE_PAGES) +
-		     zone_reclaimable_pages(z) - z->dirty_balance_reserve;
+		x += zone_dirtyable_memory(z);
 	}
 	/*
 	 * Unreclaimable memory (kernel memory or anonymous memory
@@ -235,9 +252,11 @@ static unsigned long global_dirtyable_me
 {
 	unsigned long x;
 
-	x = global_page_state(NR_FREE_PAGES) + global_reclaimable_pages();
+	x = global_page_state(NR_FREE_PAGES);
 	x -= min(x, dirty_balance_reserve);
 
+	x += global_reclaimable_pages();
+
 	if (!vm_highmem_is_dirtyable)
 		x -= highmem_dirtyable_memory(x);
 
@@ -289,32 +308,6 @@ void global_dirty_limits(unsigned long *
 }
 
 /**
- * zone_dirtyable_memory - number of dirtyable pages in a zone
- * @zone: the zone
- *
- * Returns the zone's number of pages potentially available for dirty
- * page cache.  This is the base value for the per-zone dirty limits.
- */
-static unsigned long zone_dirtyable_memory(struct zone *zone)
-{
-	/*
-	 * The effective global number of dirtyable pages may exclude
-	 * highmem as a big-picture measure to keep the ratio between
-	 * dirty memory and lowmem reasonable.
-	 *
-	 * But this function is purely about the individual zone and a
-	 * highmem zone can hold its share of dirty pages, so we don't
-	 * care about vm_highmem_is_dirtyable here.
-	 */
-	unsigned long nr_pages = zone_page_state(zone, NR_FREE_PAGES) +
-		zone_reclaimable_pages(zone);
-
-	/* don't allow this to underflow */
-	nr_pages -= min(nr_pages, zone->dirty_balance_reserve);
-	return nr_pages;
-}
-
-/**
  * zone_dirty_limit - maximum number of dirty pages allowed in a zone
  * @zone: the zone
  *



  parent reply	other threads:[~2014-02-11 19:28 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-11 19:05 [PATCH 3.10 00/79] 3.10.30-stable review Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 01/79] SELinux: Fix memory leak upon loading policy Greg Kroah-Hartman
2014-02-11 19:05   ` Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 02/79] tracing: Have trace buffer point back to trace_array Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 03/79] tracing: Check if tracing is enabled in trace_puts() Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 04/79] arch/sh/kernel/kgdb.c: add missing #include <linux/sched.h> Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 05/79] intel-iommu: fix off-by-one in pagetable freeing Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 06/79] Revert "EISA: Initialize device before its resources" Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 07/79] fuse: fix pipe_buf_operations Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 08/79] audit: reset audit backlog wait time after error recovery Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 09/79] audit: correct a type mismatch in audit_syscall_exit() Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 10/79] mm/memory-failure.c: shift page lock from head page to tail page after thp split Greg Kroah-Hartman
2014-02-11 19:05 ` Greg Kroah-Hartman [this message]
2014-02-11 19:05 ` [PATCH 3.10 12/79] mm/page-writeback.c: do not count anon pages as dirtyable memory Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 13/79] mmc: fix host release issue after discard operation Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 14/79] mmc: atmel-mci: fix timeout errors in SDIO mode when using DMA Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 15/79] slub: Fix calculation of cpu slabs Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 16/79] turbostat: Dont put unprocessed uapi headers in the include path Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 18/79] ACPI / init: Flag use of ACPI and ACPI idioms for power supplies to regulator API Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 19/79] compat: fix sys_fanotify_mark Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 20/79] fs/compat: fix parameter handling for compat readv/writev syscalls Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 21/79] fs/compat: fix lookup_dcookie() parameter handling Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 22/79] tile: remove compat_sys_lookup_dcookie declaration to fix compile error Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 23/79] mtd: mxc_nand: remove duplicated ecc_stats counting Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 24/79] ore: Fix wrong math in allocation of per device BIO Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 25/79] xtensa: xtfpga: fix definitions of platform devices Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 26/79] IB/qib: Fix QP check when looping back to/from QP1 Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 27/79] spi/bcm63xx: dont substract prepend length from total length Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 28/79] spidev: fix hang when transfer_one_message fails Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 29/79] NFSv4: OPEN must handle the NFS4ERR_IO return code correctly Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 30/79] nfs4.1: properly handle ENOTSUP in SECINFO_NO_NAME Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 31/79] NFSv4.1: Handle errors correctly in nfs41_walk_client_list Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 32/79] nfs4: fix discover_server_trunking use after free Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 33/79] pnfs: Proper delay for NFS4ERR_RECALLCONFLICT in layout_get_done Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 34/79] sunrpc: Fix infinite loop in RPC state machine Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 35/79] dm thin: fix discard support to a previously shared block Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 36/79] dm thin: initialize dm_thin_new_mapping returned by get_next_mapping Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 37/79] dm: wait until embedded kobject is released before destroying a device Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 38/79] dm space map common: make sure new space is used during extend Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 39/79] dm space map metadata: fix extending the space map Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 40/79] dm space map metadata: fix bug in resizing of thin metadata Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 43/79] mm, oom: base root bonus on current usage Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 44/79] media: anysee: fix non-working E30 Combo Plus DVB-T Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 45/79] [media] dib8000: make 32 bits read atomic Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 46/79] [media] media: s5p_mfc: remove s5p_mfc_get_node_type() function Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 47/79] [media] nxt200x: increase write buffer size Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 48/79] [media] dib8000: fix regression with dib807x Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 49/79] [media] m88rs2000: add m88rs2000_set_carrieroffset Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 50/79] [media] m88rs2000: set symbol rate accurately Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 52/79] drm/radeon: disable ss on DP for DCE3.x Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 53/79] drm/radeon: fix surface sync in fence on cayman (v2) Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 54/79] drm/radeon: set the full cache bit for fences on r7xx+ Greg Kroah-Hartman
2014-02-11 19:05 ` [PATCH 3.10 55/79] drm/radeon: fix DAC interrupt handling on DCE5+ Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.10 56/79] drm/radeon/DCE4+: clear bios scratch dpms bit (v2) Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.10 57/79] dm sysfs: fix a module unload race Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.10 58/79] drm/nouveau: fix m2mf copy to tiled gart Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.10 59/79] drm/i915: Flush outstanding requests before allocating new seqno Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.10 60/79] drm/i915: Fix the offset issue for the stolen GEM objects Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.10 61/79] drm/i915: VLV2 - Fix hotplug detect bits Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.10 62/79] i915: remove pm_qos request on error Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.10 63/79] drm/cirrus: correct register values for 16bpp Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.10 64/79] drm/mgag200: fix typo causing bw limits to be ignored on some chips Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.10 65/79] mfd: lpc_ich: Add support for Intel Avoton SoC Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.10 66/79] mfd: lpc_ich: iTCO_wdt patch for Intel Coleto Creek DeviceIDs Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.10 67/79] i2c: i801: SMBus " Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.10 68/79] ftrace: Synchronize setting function_trace_op with ftrace_trace_function Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.10 69/79] ftrace: Fix synchronization location disabling and freeing ftrace_ops Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.10 70/79] ftrace: Have function graph only trace based on global_ops filters Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.10 71/79] timekeeping: Fix lost updates to tai adjustment Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.10 72/79] timekeeping: Fix CLOCK_TAI timer/nanosleep delays Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.10 73/79] timekeeping: Fix missing timekeeping_update in suspend path Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.10 74/79] rtc-cmos: Add an alarm disable quirk Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.10 75/79] timekeeping: Avoid possible deadlock from clock_was_set_delayed Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.10 76/79] intel_pstate: Add Haswell CPU models Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.10 77/79] intel_pstate: fix no_turbo Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.10 78/79] intel_pstate: Improve accuracy by not truncating until final result Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.10 79/79] intel_pstate: Correct calculation of min pstate value Greg Kroah-Hartman
2014-02-12  4:20 ` [PATCH 3.10 00/79] 3.10.30-stable review Guenter Roeck
2014-02-12 18:57   ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140211184721.257646791@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=fengguang.wu@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=riel@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.