All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Roman Gushchin <guro@fb.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Michal Hocko <mhocko@kernel.org>, Rik van Riel <riel@surriel.com>,
	Wonhyuk Yang <vvghjk1234@gmail.com>,
	Thiago Jung Bauermann <bauerman@linux.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.9 23/49] memblock: do not start bottom-up allocations with kernel_end
Date: Mon, 22 Feb 2021 13:36:21 +0100	[thread overview]
Message-ID: <20210222121026.639879772@linuxfoundation.org> (raw)
In-Reply-To: <20210222121022.546148341@linuxfoundation.org>

From: Roman Gushchin <guro@fb.com>

[ Upstream commit 2dcb3964544177c51853a210b6ad400de78ef17d ]

With kaslr the kernel image is placed at a random place, so starting the
bottom-up allocation with the kernel_end can result in an allocation
failure and a warning like this one:

  hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
  ------------[ cut here ]------------
  memblock: bottom-up allocation failed, memory hotremove may be affected
  WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #1169
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
  RIP: 0010:memblock_find_in_range_node+0x178/0x25a
  Code: e9 6d ff ff ff 48 85 c0 0f 85 da 00 00 00 80 3d 9b 35 df 00 00 75 15 48 c7 c7 c0 75 59 88 c6 05 8b 35 df 00 01 e8 25 8a fa ff <0f> 0b 48 c7 44 24 20 ff ff ff ff 44 89 e6 44 89 ea 48 c7 c1 70 5c
  RSP: 0000:ffffffff88803d18 EFLAGS: 00010086 ORIG_RAX: 0000000000000000
  RAX: 0000000000000000 RBX: 0000000240000000 RCX: 00000000ffffdfff
  RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000046
  RBP: 0000000100000000 R08: ffffffff88922788 R09: 0000000000009ffb
  R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000000000
  R13: 0000000000000000 R14: 0000000080000000 R15: 00000001fb42c000
  FS:  0000000000000000(0000) GS:ffffffff88f71000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffffa080fb401000 CR3: 00000001fa80a000 CR4: 00000000000406b0
  Call Trace:
    memblock_alloc_range_nid+0x8d/0x11e
    cma_declare_contiguous_nid+0x2c4/0x38c
    hugetlb_cma_reserve+0xdc/0x128
    flush_tlb_one_kernel+0xc/0x20
    native_set_fixmap+0x82/0xd0
    flat_get_apic_id+0x5/0x10
    register_lapic_address+0x8e/0x97
    setup_arch+0x8a5/0xc3f
    start_kernel+0x66/0x547
    load_ucode_bsp+0x4c/0xcd
    secondary_startup_64_no_verify+0xb0/0xbb
  random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0
  ---[ end trace f151227d0b39be70 ]---

At the same time, the kernel image is protected with memblock_reserve(),
so we can just start searching at PAGE_SIZE.  In this case the bottom-up
allocation has the same chances to success as a top-down allocation, so
there is no reason to fallback in the case of a failure.  All together it
simplifies the logic.

Link: https://lkml.kernel.org/r/20201217201214.3414100-2-guro@fb.com
Fixes: 8fabc623238e ("powerpc: Ensure that swiotlb buffer is allocated from low memory")
Signed-off-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Wonhyuk Yang <vvghjk1234@gmail.com>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 mm/memblock.c | 48 ++++++------------------------------------------
 1 file changed, 6 insertions(+), 42 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index 42b98af6a4158..e43065b13c08c 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -186,14 +186,6 @@ __memblock_find_range_top_down(phys_addr_t start, phys_addr_t end,
  *
  * Find @size free area aligned to @align in the specified range and node.
  *
- * When allocation direction is bottom-up, the @start should be greater
- * than the end of the kernel image. Otherwise, it will be trimmed. The
- * reason is that we want the bottom-up allocation just near the kernel
- * image so it is highly likely that the allocated memory and the kernel
- * will reside in the same node.
- *
- * If bottom-up allocation failed, will try to allocate memory top-down.
- *
  * RETURNS:
  * Found address on success, 0 on failure.
  */
@@ -201,8 +193,6 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
 					phys_addr_t align, phys_addr_t start,
 					phys_addr_t end, int nid, ulong flags)
 {
-	phys_addr_t kernel_end, ret;
-
 	/* pump up @end */
 	if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
 		end = memblock.current_limit;
@@ -210,39 +200,13 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
 	/* avoid allocating the first page */
 	start = max_t(phys_addr_t, start, PAGE_SIZE);
 	end = max(start, end);
-	kernel_end = __pa_symbol(_end);
-
-	/*
-	 * try bottom-up allocation only when bottom-up mode
-	 * is set and @end is above the kernel image.
-	 */
-	if (memblock_bottom_up() && end > kernel_end) {
-		phys_addr_t bottom_up_start;
-
-		/* make sure we will allocate above the kernel */
-		bottom_up_start = max(start, kernel_end);
 
-		/* ok, try bottom-up allocation first */
-		ret = __memblock_find_range_bottom_up(bottom_up_start, end,
-						      size, align, nid, flags);
-		if (ret)
-			return ret;
-
-		/*
-		 * we always limit bottom-up allocation above the kernel,
-		 * but top-down allocation doesn't have the limit, so
-		 * retrying top-down allocation may succeed when bottom-up
-		 * allocation failed.
-		 *
-		 * bottom-up allocation is expected to be fail very rarely,
-		 * so we use WARN_ONCE() here to see the stack trace if
-		 * fail happens.
-		 */
-		WARN_ONCE(1, "memblock: bottom-up allocation failed, memory hotunplug may be affected\n");
-	}
-
-	return __memblock_find_range_top_down(start, end, size, align, nid,
-					      flags);
+	if (memblock_bottom_up())
+		return __memblock_find_range_bottom_up(start, end, size, align,
+						       nid, flags);
+	else
+		return __memblock_find_range_top_down(start, end, size, align,
+						      nid, flags);
 }
 
 /**
-- 
2.27.0




  parent reply	other threads:[~2021-02-22 13:48 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-22 12:35 [PATCH 4.9 00/49] 4.9.258-rc1 review Greg Kroah-Hartman
2021-02-22 12:35 ` [PATCH 4.9 01/49] mm: memcontrol: fix NULL pointer crash in test_clear_page_writeback() Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 02/49] fgraph: Initialize tracing_graph_pause at task creation Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 03/49] remoteproc: qcom_q6v5_mss: Validate MBA firmware size before load Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 04/49] af_key: relax availability checks for skb size calculation Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 05/49] iwlwifi: mvm: take mutex for calling iwl_mvm_get_sync_time() Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 06/49] iwlwifi: pcie: add a NULL check in iwl_pcie_txq_unmap Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 07/49] iwlwifi: mvm: guard against device removal in reprobe Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 08/49] SUNRPC: Move simple_get_bytes and simple_get_netobj into private header Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 09/49] SUNRPC: Handle 0 length opaque XDR object data properly Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 10/49] lib/string: Add strscpy_pad() function Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 11/49] include/trace/events/writeback.h: fix -Wstringop-truncation warnings Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 12/49] memcg: fix a crash in wb_workfn when a device disappears Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 13/49] futex: Ensure the correct return value from futex_lock_pi() Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 14/49] futex: Change locking rules Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 15/49] futex: Cure exit race Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 16/49] squashfs: add more sanity checks in id lookup Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 17/49] squashfs: add more sanity checks in inode lookup Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 18/49] squashfs: add more sanity checks in xattr id lookup Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 19/49] tracing: Do not count ftrace events in top level enable output Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 20/49] tracing: Check length before giving out the filter buffer Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 21/49] ovl: skip getxattr of security labels Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 22/49] ARM: dts: lpc32xx: Revert set default clock rate of HCLK PLL Greg Kroah-Hartman
2021-02-22 12:36 ` Greg Kroah-Hartman [this message]
2021-02-22 12:36 ` [PATCH 4.9 24/49] bpf: Check for integer overflow when using roundup_pow_of_two() Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 25/49] netfilter: xt_recent: Fix attempt to update deleted entry Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 26/49] xen/netback: avoid race in xenvif_rx_ring_slots_available() Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 27/49] netfilter: conntrack: skip identical origin tuple in same zone only Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 28/49] h8300: fix PREEMPTION build, TI_PRE_COUNT undefined Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 29/49] usb: dwc3: ulpi: fix checkpatch warning Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 30/49] usb: dwc3: ulpi: Replace CPU-based busyloop with Protocol-based one Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 31/49] net/vmw_vsock: improve locking in vsock_connect_timeout() Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 32/49] net: watchdog: hold device global xmit lock during tx disable Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 33/49] vsock/virtio: update credit only if socket is not closed Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 34/49] vsock: fix locking in vsock_shutdown() Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 35/49] x86/build: Disable CET instrumentation in the kernel for 32-bit too Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 36/49] trace: Use -mcount-record for dynamic ftrace Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 37/49] tracing: Fix SKIP_STACK_VALIDATION=1 build due to bad merge with -mrecord-mcount Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 38/49] tracing: Avoid calling cc-option -mrecord-mcount for every Makefile Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 39/49] Xen/x86: dont bail early from clear_foreign_p2m_mapping() Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 40/49] Xen/x86: also check kernel mapping in set_foreign_p2m_mapping() Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 41/49] Xen/gntdev: correct dev_bus_addr handling in gntdev_map_grant_pages() Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 42/49] Xen/gntdev: correct error checking " Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 43/49] xen/arm: dont ignore return errors from set_phys_to_machine Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 44/49] xen-blkback: dont "handle" error by BUG() Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 45/49] xen-netback: " Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 46/49] xen-scsiback: " Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 47/49] xen-blkback: fix error handling in xen_blkbk_map() Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 48/49] scsi: qla2xxx: Fix crash during driver load on big endian machines Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 49/49] kvm: check tlbs_dirty directly Greg Kroah-Hartman
2021-02-22 18:24 ` [PATCH 4.9 00/49] 4.9.258-rc1 review Florian Fainelli
2021-02-22 21:27 ` Guenter Roeck
2021-02-23 12:04 ` Naresh Kamboju
2021-02-23 13:47 ` Jon Hunter
2021-02-23 21:19 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210222121026.639879772@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=bauerman@linux.ibm.com \
    --cc=guro@fb.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=riel@surriel.com \
    --cc=rppt@linux.ibm.com \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vvghjk1234@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.