All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Mel Gorman <mgorman@techsingularity.net>,
	Mikulas Patocka <mpatocka@redhat.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	James Bottomley <James.Bottomley@hansenpartnership.com>,
	Matthew Wilcox <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 5.0 20/89] mm: do not boost watermarks to avoid fragmentation for the DISCONTIG memory model
Date: Tue, 30 Apr 2019 13:38:11 +0200	[thread overview]
Message-ID: <20190430113611.028764341@linuxfoundation.org> (raw)
In-Reply-To: <20190430113609.741196396@linuxfoundation.org>

From: Mel Gorman <mgorman@techsingularity.net>

commit 24512228b7a3f412b5a51f189df302616b021c33 upstream.

Mikulas Patocka reported that commit 1c30844d2dfe ("mm: reclaim small
amounts of memory when an external fragmentation event occurs") "broke"
memory management on parisc.

The machine is not NUMA but the DISCONTIG model creates three pgdats
even though it's a UMA machine for the following ranges

        0) Start 0x0000000000000000 End 0x000000003fffffff Size   1024 MB
        1) Start 0x0000000100000000 End 0x00000001bfdfffff Size   3070 MB
        2) Start 0x0000004040000000 End 0x00000040ffffffff Size   3072 MB

Mikulas reported:

	With the patch 1c30844d2, the kernel will incorrectly reclaim the
	first zone when it fills up, ignoring the fact that there are two
	completely free zones. Basiscally, it limits cache size to 1GiB.

	For example, if I run:
	# dd if=/dev/sda of=/dev/null bs=1M count=2048

	- with the proper kernel, there should be "Buffers - 2GiB"
	when this command finishes. With the patch 1c30844d2, buffers
	will consume just 1GiB or slightly more, because the kernel was
	incorrectly reclaiming them.

The page allocator and reclaim makes assumptions that pgdats really
represent NUMA nodes and zones represent ranges and makes decisions on
that basis.  Watermark boosting for small pgdats leads to unexpected
results even though this would have behaved reasonably on SPARSEMEM.

DISCONTIG is essentially deprecated and even parisc plans to move to
SPARSEMEM so there is no need to be fancy, this patch simply disables
watermark boosting by default on DISCONTIGMEM.

Link: http://lkml.kernel.org/r/20190419094335.GJ18914@techsingularity.net
Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Tested-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 Documentation/sysctl/vm.txt |   16 ++++++++--------
 mm/page_alloc.c             |   13 +++++++++++++
 2 files changed, 21 insertions(+), 8 deletions(-)

--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -866,14 +866,14 @@ The intent is that compaction has less w
 increase the success rate of future high-order allocations such as SLUB
 allocations, THP and hugetlbfs pages.
 
-To make it sensible with respect to the watermark_scale_factor parameter,
-the unit is in fractions of 10,000. The default value of 15,000 means
-that up to 150% of the high watermark will be reclaimed in the event of
-a pageblock being mixed due to fragmentation. The level of reclaim is
-determined by the number of fragmentation events that occurred in the
-recent past. If this value is smaller than a pageblock then a pageblocks
-worth of pages will be reclaimed (e.g.  2MB on 64-bit x86). A boost factor
-of 0 will disable the feature.
+To make it sensible with respect to the watermark_scale_factor
+parameter, the unit is in fractions of 10,000. The default value of
+15,000 on !DISCONTIGMEM configurations means that up to 150% of the high
+watermark will be reclaimed in the event of a pageblock being mixed due
+to fragmentation. The level of reclaim is determined by the number of
+fragmentation events that occurred in the recent past. If this value is
+smaller than a pageblock then a pageblocks worth of pages will be reclaimed
+(e.g.  2MB on 64-bit x86). A boost factor of 0 will disable the feature.
 
 =============================================================
 
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -266,7 +266,20 @@ compound_page_dtor * const compound_page
 
 int min_free_kbytes = 1024;
 int user_min_free_kbytes = -1;
+#ifdef CONFIG_DISCONTIGMEM
+/*
+ * DiscontigMem defines memory ranges as separate pg_data_t even if the ranges
+ * are not on separate NUMA nodes. Functionally this works but with
+ * watermark_boost_factor, it can reclaim prematurely as the ranges can be
+ * quite small. By default, do not boost watermarks on discontigmem as in
+ * many cases very high-order allocations like THP are likely to be
+ * unsupported and the premature reclaim offsets the advantage of long-term
+ * fragmentation avoidance.
+ */
+int watermark_boost_factor __read_mostly;
+#else
 int watermark_boost_factor __read_mostly = 15000;
+#endif
 int watermark_scale_factor = 10;
 
 static unsigned long nr_kernel_pages __initdata;



  parent reply	other threads:[~2019-04-30 11:48 UTC|newest]

Thread overview: 107+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-30 11:37 [PATCH 5.0 00/89] 5.0.11-stable review Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 5.0 01/89] netfilter: nf_tables: bogus EBUSY when deleting set after flush Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 5.0 02/89] netfilter: nf_tables: bogus EBUSY in helper removal from transaction Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 5.0 03/89] intel_th: gth: Fix an off-by-one in output unassigning Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 5.0 04/89] powerpc/vdso32: fix CLOCK_MONOTONIC on PPC64 Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 5.0 05/89] ALSA: hda/realtek - Move to ACT_INIT state Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 5.0 06/89] fs/proc/proc_sysctl.c: Fix a NULL pointer dereference Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 5.0 07/89] block, bfq: fix use after free in bfq_bfqq_expire Greg Kroah-Hartman
2019-04-30 11:37 ` [PATCH 5.0 08/89] cifs: fix memory leak in SMB2_read Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 09/89] cifs: fix page reference leak with readv/writev Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 10/89] cifs: do not attempt cifs operation on smb2+ rename error Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 11/89] tracing: Fix a memory leak by early error exit in trace_pid_write() Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 12/89] tracing: Fix buffer_ref pipe ops Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 13/89] crypto: xts - Fix atomic sleep when walking skcipher Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 14/89] crypto: lrw " Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 15/89] gpio: eic: sprd: Fix incorrect irq type setting for the sync EIC Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 16/89] zram: pass down the bvec we need to read into in the work struct Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 17/89] lib/Kconfig.debug: fix build error without CONFIG_BLOCK Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 18/89] MIPS: scall64-o32: Fix indirect syscall number load Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 19/89] trace: Fix preempt_enable_no_resched() abuse Greg Kroah-Hartman
2019-04-30 11:38 ` Greg Kroah-Hartman [this message]
2019-04-30 11:38 ` [PATCH 5.0 21/89] arm64: mm: Ensure tail of unaligned initrd is reserved Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 22/89] IB/rdmavt: Fix frwr memory registration Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 23/89] RDMA/mlx5: Do not allow the user to write to the clock page Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 24/89] RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 25/89] RDMA/ucontext: Fix regression with disassociate Greg Kroah-Hartman
2019-05-03 11:47   ` Michal Kubecek
2019-05-03 11:48     ` Michal Kubecek
2019-04-30 11:38 ` [PATCH 5.0 26/89] sched/numa: Fix a possible divide-by-zero Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 27/89] ceph: only use d_name directly when parent is locked Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 28/89] ceph: ensure d_name stability in ceph_dentry_hash() Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 29/89] ceph: fix ci->i_head_snapc leak Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 30/89] nfsd: Dont release the callback slot unless it was actually held Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 31/89] nfsd: wake waiters blocked on file_lock before deleting it Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 32/89] nfsd: wake blocked file lock waiters before sending callback Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 33/89] sunrpc: dont mark uninitialised items as VALID Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 34/89] perf/x86/intel: Update KBL Package C-state events to also include PC8/PC9/PC10 counters Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 35/89] Input: synaptics-rmi4 - write config register values to the right offset Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 36/89] vfio/type1: Limit DMA mappings per container Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 37/89] dmaengine: sh: rcar-dmac: With cyclic DMA residue 0 is valid Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 38/89] dmaengine: sh: rcar-dmac: Fix glitch in dmaengine_tx_status Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 39/89] dmaengine: mediatek-cqdma: fix wrong register usage in mtk_cqdma_start Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 40/89] ARM: 8857/1: efi: enable CP15 DMB instructions before cleaning the cache Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 41/89] powerpc/mm/radix: Make Radix require HUGETLB_PAGE Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 42/89] drm/vc4: Fix memory leak during gpu reset Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 43/89] drm/ttm: fix re-init of global structures Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 44/89] Revert "drm/i915/fbdev: Actually configure untiled displays" Greg Kroah-Hartman
2019-05-01 13:02   ` Sasha Levin
2019-05-01 13:06     ` Sasha Levin
2019-05-01 13:08     ` Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 45/89] drm/vc4: Fix compilation error reported by kbuild test bot Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 46/89] USB: Add new USB LPM helpers Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 47/89] USB: Consolidate LPM checks to avoid enabling LPM twice Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 48/89] ext4: fix some error pointer dereferences Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 49/89] loop: do not print warn message if partition scan is successful Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 50/89] tipc: handle the err returned from cmd header function Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 51/89] slip: make slhc_free() silently accept an error pointer Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 52/89] workqueue: Try to catch flush_work() without INIT_WORK() Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 53/89] binder: fix handling of misaligned binder object Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 54/89] sched/deadline: Correctly handle active 0-lag timers Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 55/89] mac80211_hwsim: calculate if_combination.max_interfaces Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 56/89] NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 57/89] netfilter: ebtables: CONFIG_COMPAT: drop a bogus WARN_ON Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 58/89] fm10k: Fix a potential NULL pointer dereference Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 59/89] tipc: check bearer name with right length in tipc_nl_compat_bearer_enable Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 60/89] tipc: check link name with right length in tipc_nl_compat_link_set Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 61/89] net: netrom: Fix error cleanup path of nr_proto_init Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 62/89] net/rds: Check address length before reading address family Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 63/89] rxrpc: fix race condition in rxrpc_input_packet() Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 64/89] pin iocb through aio Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 65/89] aio: fold lookup_kiocb() into its sole caller Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 66/89] aio: keep io_event in aio_kiocb Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 67/89] aio: store event at final iocb_put() Greg Kroah-Hartman
2019-04-30 11:38 ` [PATCH 5.0 68/89] Fix aio_poll() races Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 5.0 69/89] x86, retpolines: Raise limit for generating indirect calls from switch-case Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 5.0 70/89] x86/retpolines: Disable switch jump tables when retpolines are enabled Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 5.0 71/89] rdma: fix build errors on s390 and MIPS due to bad ZERO_PAGE use Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 5.0 72/89] ipv4: add sanity checks in ipv4_link_failure() Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 5.0 73/89] ipv4: set the tcp_min_rtt_wlen range from 0 to one day Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 5.0 74/89] mlxsw: spectrum: Fix autoneg status in ethtool Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 5.0 75/89] net/mlx5e: ethtool, Remove unsupported SFP EEPROM high pages query Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 5.0 76/89] net: rds: exchange of 8K and 1M pool Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 5.0 77/89] net/rose: fix unbound loop in rose_loopback_timer() Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 5.0 78/89] net: stmmac: move stmmac_check_ether_addr() to driver probe Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 5.0 79/89] net/tls: fix refcount adjustment in fallback Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 5.0 80/89] stmmac: pci: Adjust IOT2000 matching Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 5.0 81/89] team: fix possible recursive locking when add slaves Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 5.0 82/89] net: socionext: replace napi_alloc_frag with the netdev variant on init Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 5.0 83/89] net/ncsi: handle overflow when incrementing mac address Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 5.0 84/89] mlxsw: pci: Reincrease PCI reset timeout Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 5.0 85/89] mlxsw: spectrum: Put MC TCs into DWRR mode Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 5.0 86/89] net/mlx5e: Fix the max MTU check in case of XDP Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 5.0 87/89] net/mlx5e: Fix use-after-free after xdp_return_frame Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 5.0 88/89] net/tls: avoid potential deadlock in tls_set_device_offload_rx() Greg Kroah-Hartman
2019-04-30 11:39 ` [PATCH 5.0 89/89] net/tls: dont leak IV and record seq when offload fails Greg Kroah-Hartman
2019-04-30 17:06 ` [PATCH 5.0 00/89] 5.0.11-stable review kernelci.org bot
2019-04-30 22:33 ` shuah
2019-05-01  7:55   ` Greg Kroah-Hartman
2019-05-01  6:21 ` Naresh Kamboju
2019-05-01  7:55   ` Greg Kroah-Hartman
2019-05-01  8:26 ` Jon Hunter
2019-05-01  8:26   ` Jon Hunter
2019-05-01  8:43   ` Greg Kroah-Hartman
2019-05-01 16:44 ` Guenter Roeck
2019-05-01 17:14   ` Greg Kroah-Hartman
2019-05-02  5:30 ` Bharath Vedartham
2019-05-02  6:44   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190430113611.028764341@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mpatocka@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.