From: Andrew Morton <akpm@linux-foundation.org>
To: adobriyan@gmail.com, akpm@linux-foundation.org,
bob.picco@oracle.com, dan.j.williams@intel.com,
daniel.m.jordan@oracle.com, david@redhat.com, linux-mm@kvack.org,
mhocko@kernel.org, mhocko@suse.com, mm-commits@vger.kernel.org,
n-horiguchi@ah.jp.nec.com, osalvador@suse.de,
pasha.tatashin@oracle.com, sfr@canb.auug.org.au,
stable@vger.kernel.org, steven.sistare@oracle.com,
torvalds@linux-foundation.org
Subject: [patch 02/67] mm/page_alloc.c: fix uninitialized memmaps on a partially populated last section
Date: Mon, 03 Feb 2020 17:33:48 -0800 [thread overview]
Message-ID: <20200204013348.GxUmZtFO4%akpm@linux-foundation.org> (raw)
In-Reply-To: <20200203173311.6269a8be06a05e5a4aa08a93@linux-foundation.org>
From: David Hildenbrand <david@redhat.com>
Subject: mm/page_alloc.c: fix uninitialized memmaps on a partially populated last section
Patch series "mm: fix max_pfn not falling on section boundary", v2.
Playing with different memory sizes for a x86-64 guest, I discovered that
some memmaps (highest section if max_mem does not fall on the section
boundary) are marked as being valid and online, but contain garbage. We
have to properly initialize these memmaps.
Looking at /proc/kpageflags and friends, I found some more issues,
partially related to this.
This patch (of 3):
If max_pfn is not aligned to a section boundary, we can easily run into
BUGs. This can e.g., be triggered on x86-64 under QEMU by specifying a
memory size that is not a multiple of 128MB (e.g., 4097MB, but also
4160MB). I was told that on real HW, we can easily have this scenario
(esp., one of the main reasons sub-section hotadd of devmem was added).
The issue is, that we have a valid memmap (pfn_valid()) for the whole
section, and the whole section will be marked "online".
pfn_to_online_page() will succeed, but the memmap contains garbage.
E.g., doing a "./page-types -r -a 0x144001" when QEMU was started with "-m
4160M" - (see tools/vm/page-types.c):
[ 200.476376] BUG: unable to handle page fault for address: fffffffffffffffe
[ 200.477500] #PF: supervisor read access in kernel mode
[ 200.478334] #PF: error_code(0x0000) - not-present page
[ 200.479076] PGD 59614067 P4D 59614067 PUD 59616067 PMD 0
[ 200.479557] Oops: 0000 [#4] SMP NOPTI
[ 200.479875] CPU: 0 PID: 603 Comm: page-types Tainted: G D W 5.5.0-rc1-next-20191209 #93
[ 200.480646] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu4
[ 200.481648] RIP: 0010:stable_page_flags+0x4d/0x410
[ 200.482061] Code: f3 ff 41 89 c0 48 b8 00 00 00 00 01 00 00 00 45 84 c0 0f 85 cd 02 00 00 48 8b 53 08 48 8b 2b 48f
[ 200.483644] RSP: 0018:ffffb139401cbe60 EFLAGS: 00010202
[ 200.484091] RAX: fffffffffffffffe RBX: fffffbeec5100040 RCX: 0000000000000000
[ 200.484697] RDX: 0000000000000001 RSI: ffffffff9535c7cd RDI: 0000000000000246
[ 200.485313] RBP: ffffffffffffffff R08: 0000000000000000 R09: 0000000000000000
[ 200.485917] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000144001
[ 200.486523] R13: 00007ffd6ba55f48 R14: 00007ffd6ba55f40 R15: ffffb139401cbf08
[ 200.487130] FS: 00007f68df717580(0000) GS:ffff9ec77fa00000(0000) knlGS:0000000000000000
[ 200.487804] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 200.488295] CR2: fffffffffffffffe CR3: 0000000135d48000 CR4: 00000000000006f0
[ 200.488897] Call Trace:
[ 200.489115] kpageflags_read+0xe9/0x140
[ 200.489447] proc_reg_read+0x3c/0x60
[ 200.489755] vfs_read+0xc2/0x170
[ 200.490037] ksys_pread64+0x65/0xa0
[ 200.490352] do_syscall_64+0x5c/0xa0
[ 200.490665] entry_SYSCALL_64_after_hwframe+0x49/0xbe
But it can be triggered much easier via "cat /proc/kpageflags > /dev/null"
after cold/hot plugging a DIMM to such a system:
[root@localhost ~]# cat /proc/kpageflags > /dev/null
[ 111.517275] BUG: unable to handle page fault for address: fffffffffffffffe
[ 111.517907] #PF: supervisor read access in kernel mode
[ 111.518333] #PF: error_code(0x0000) - not-present page
[ 111.518771] PGD a240e067 P4D a240e067 PUD a2410067 PMD 0
This patch fixes that by at least zero-ing out that memmap (so e.g.,
page_to_pfn() will not crash). Commit 907ec5fca3dc ("mm: zero remaining
unavailable struct pages") tried to fix a similar issue, but forgot to
consider this special case.
After this patch, there are still problems to solve. E.g., not all of
these pages falling into a memory hole will actually get initialized later
and set PageReserved - they are only zeroed out - but at least the
immediate crashes are gone. A follow-up patch will take care of this.
Link: http://lkml.kernel.org/r/20191211163201.17179-2-david@redhat.com
Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap")
Signed-off-by: David Hildenbrand <david@redhat.com>
Tested-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: <stable@vger.kernel.org> [4.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
--- a/mm/page_alloc.c~mm-fix-uninitialized-memmaps-on-a-partially-populated-last-section
+++ a/mm/page_alloc.c
@@ -6947,7 +6947,8 @@ static u64 zero_pfn_range(unsigned long
* This function also addresses a similar issue where struct pages are left
* uninitialized because the physical address range is not covered by
* memblock.memory or memblock.reserved. That could happen when memblock
- * layout is manually configured via memmap=.
+ * layout is manually configured via memmap=, or when the highest physical
+ * address (max_pfn) does not end on a section boundary.
*/
void __init zero_resv_unavail(void)
{
@@ -6965,7 +6966,16 @@ void __init zero_resv_unavail(void)
pgcnt += zero_pfn_range(PFN_DOWN(next), PFN_UP(start));
next = end;
}
- pgcnt += zero_pfn_range(PFN_DOWN(next), max_pfn);
+
+ /*
+ * Early sections always have a fully populated memmap for the whole
+ * section - see pfn_valid(). If the last section has holes at the
+ * end and that section is marked "online", the memmap will be
+ * considered initialized. Make sure that memmap has a well defined
+ * state.
+ */
+ pgcnt += zero_pfn_range(PFN_DOWN(next),
+ round_up(max_pfn, PAGES_PER_SECTION));
/*
* Struct pages that do not have backing memory. This could be because
_
next prev parent reply other threads:[~2020-02-04 1:33 UTC|newest]
Thread overview: 95+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-04 1:33 incoming Andrew Morton
2020-02-04 1:33 ` [patch 01/67] ocfs2: fix oops when writing cloned file Andrew Morton
2020-02-04 1:33 ` Andrew Morton [this message]
2020-02-04 1:33 ` [patch 03/67] fs/proc/page.c: allow inspection of last section and fix end detection Andrew Morton
2020-02-04 1:33 ` [patch 04/67] mm/page_alloc.c: initialize memmap of unavailable memory directly Andrew Morton
2020-02-04 1:33 ` [patch 05/67] mm/page_alloc: fix and rework pfn handling in memmap_init_zone() Andrew Morton
2020-02-04 2:45 ` Linus Torvalds
2020-02-04 1:34 ` [patch 06/67] mm: factor out next_present_section_nr() Andrew Morton
2020-02-04 3:04 ` Linus Torvalds
2020-02-04 4:29 ` Andrew Morton
2020-02-04 8:22 ` David Hildenbrand
2020-02-04 1:34 ` [patch 07/67] mm/memmap_init: update variable name in memmap_init_zone Andrew Morton
2020-02-04 1:34 ` [patch 08/67] mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone() Andrew Morton
2020-02-04 1:34 ` [patch 09/67] mm/memory_hotplug: we always have a zone in find_(smallest|biggest)_section_pfn Andrew Morton
2020-02-04 1:34 ` [patch 10/67] mm/memory_hotplug: don't check for "all holes" in shrink_zone_span() Andrew Morton
2020-02-04 1:34 ` [patch 11/67] mm/memory_hotplug: drop local variables " Andrew Morton
2020-02-04 1:34 ` [patch 12/67] mm/memory_hotplug: cleanup __remove_pages() Andrew Morton
2020-02-04 1:34 ` [patch 13/67] mm/memory_hotplug: drop valid_start/valid_end from test_pages_in_a_zone() Andrew Morton
2020-02-04 1:34 ` [patch 14/67] smp_mb__{before,after}_atomic(): update Documentation Andrew Morton
2020-02-04 1:34 ` [patch 15/67] ipc/mqueue.c: remove duplicated code Andrew Morton
2020-02-04 1:34 ` [patch 16/67] ipc/mqueue.c: update/document memory barriers Andrew Morton
2020-02-04 1:34 ` [patch 17/67] ipc/msg.c: update and document " Andrew Morton
2020-02-04 1:34 ` [patch 18/67] ipc/sem.c: document and update " Andrew Morton
2020-02-04 1:34 ` [patch 19/67] ipc/msg.c: consolidate all xxxctl_down() functions Andrew Morton
2020-02-04 1:34 ` [patch 20/67] drivers/block/null_blk_main.c: fix layout Andrew Morton
2020-02-04 1:34 ` [patch 21/67] drivers/block/null_blk_main.c: fix uninitialized var warnings Andrew Morton
2020-02-04 1:34 ` [patch 22/67] pinctrl: fix pxa2xx.c build warnings Andrew Morton
2020-02-04 1:34 ` [patch 23/67] mm: remove __krealloc Andrew Morton
2020-02-04 1:35 ` [patch 24/67] mm: add generic p?d_leaf() macros Andrew Morton
2020-02-04 1:35 ` [patch 25/67] arc: mm: add p?d_leaf() definitions Andrew Morton
2020-02-04 1:35 ` [patch 26/67] arm: " Andrew Morton
2020-02-04 1:35 ` [patch 27/67] arm64: " Andrew Morton
2020-02-04 1:35 ` [patch 28/67] mips: " Andrew Morton
2020-02-04 1:35 ` [patch 29/67] powerpc: " Andrew Morton
2020-02-04 1:35 ` [patch 30/67] riscv: " Andrew Morton
2020-02-04 1:35 ` [patch 31/67] s390: " Andrew Morton
2020-02-04 1:35 ` [patch 32/67] sparc: " Andrew Morton
2020-02-04 1:35 ` [patch 33/67] x86: " Andrew Morton
2020-02-04 1:35 ` [patch 34/67] mm: pagewalk: add p4d_entry() and pgd_entry() Andrew Morton
2020-02-04 1:35 ` [patch 35/67] mm: pagewalk: allow walking without vma Andrew Morton
2020-02-04 1:35 ` [patch 36/67] mm: pagewalk: don't lock PTEs for walk_page_range_novma() Andrew Morton
2020-02-04 1:35 ` [patch 37/67] mm: pagewalk: fix termination condition in walk_pte_range() Andrew Morton
2020-02-04 1:36 ` [patch 38/67] mm: pagewalk: add 'depth' parameter to pte_hole Andrew Morton
2020-02-04 1:36 ` [patch 39/67] x86: mm: point to struct seq_file from struct pg_state Andrew Morton
2020-02-04 1:36 ` [patch 40/67] x86: mm+efi: convert ptdump_walk_pgd_level() to take a mm_struct Andrew Morton
2020-02-04 1:36 ` [patch 41/67] x86: mm: convert ptdump_walk_pgd_level_debugfs() to take an mm_struct Andrew Morton
2020-02-04 1:36 ` [patch 42/67] mm: add generic ptdump Andrew Morton
2020-02-04 1:36 ` [patch 43/67] x86: mm: convert dump_pagetables to use walk_page_range Andrew Morton
2020-02-04 1:36 ` [patch 44/67] arm64: mm: convert mm/dump.c to use walk_page_range() Andrew Morton
2020-02-04 1:36 ` [patch 45/67] arm64: mm: display non-present entries in ptdump Andrew Morton
2020-02-04 1:36 ` [patch 46/67] mm: ptdump: reduce level numbers by 1 in note_page() Andrew Morton
2020-02-04 1:36 ` [patch 47/67] x86: mm: avoid allocating struct mm_struct on the stack Andrew Morton
2020-02-04 1:36 ` [patch 48/67] powerpc/mmu_gather: enable RCU_TABLE_FREE even for !SMP case Andrew Morton
2020-02-04 1:36 ` [patch 49/67] mm/mmu_gather: invalidate TLB correctly on batch allocation failure and flush Andrew Morton
2020-02-04 1:36 ` [patch 50/67] asm-generic/tlb: avoid potential double flush Andrew Morton
2020-02-04 1:36 ` [patch 51/67] asm-gemeric/tlb: remove stray function declarations Andrew Morton
2020-02-04 1:36 ` [patch 52/67] asm-generic/tlb: add missing CONFIG symbol Andrew Morton
2020-02-04 1:37 ` [patch 53/67] asm-generic/tlb: rename HAVE_RCU_TABLE_FREE Andrew Morton
2020-02-04 1:37 ` [patch 54/67] asm-generic/tlb: rename HAVE_MMU_GATHER_PAGE_SIZE Andrew Morton
2020-02-04 1:37 ` [patch 55/67] asm-generic/tlb: rename HAVE_MMU_GATHER_NO_GATHER Andrew Morton
2020-02-04 1:37 ` [patch 56/67] asm-generic/tlb: provide MMU_GATHER_TABLE_FREE Andrew Morton
2020-02-04 1:37 ` [patch 57/67] proc: decouple proc from VFS with "struct proc_ops" Andrew Morton
2020-02-04 1:37 ` [patch 58/67] proc: convert everything to " Andrew Morton
2020-02-04 1:37 ` [patch 59/67] lib/string: add strnchrnul() Andrew Morton
2020-02-04 1:37 ` [patch 60/67] bitops: more BITS_TO_* macros Andrew Morton
2020-02-04 1:37 ` [patch 61/67] lib: add test for bitmap_parse() Andrew Morton
2020-02-04 1:37 ` [patch 62/67] lib: make bitmap_parse_user a wrapper on bitmap_parse Andrew Morton
2020-02-04 1:37 ` [patch 63/67] lib: rework bitmap_parse() Andrew Morton
2020-02-04 1:37 ` [patch 64/67] lib: new testcases for bitmap_parse{_user} Andrew Morton
2020-02-04 1:37 ` [patch 65/67] include/linux/cpumask.h: don't calculate length of the input string Andrew Morton
2020-02-04 1:37 ` [patch 66/67] treewide: remove redundant IS_ERR() before error code check Andrew Morton
2020-02-04 1:37 ` [patch 67/67] ARM: dma-api: fix max_pfn off-by-one error in __dma_supported() Andrew Morton
2020-02-04 2:27 ` incoming Linus Torvalds
2020-02-04 2:46 ` incoming Andrew Morton
2020-02-04 3:11 ` incoming Linus Torvalds
2020-02-14 6:26 ` mmotm 2020-02-13-22-26 uploaded Andrew Morton
2020-02-14 16:29 ` mmotm 2020-02-13-22-26 uploaded (mm/hugetlb.c) Randy Dunlap
2020-02-14 17:18 ` Mike Kravetz
2020-02-14 20:51 ` Mina Almasry
[not found] ` <20200214204544.231482-1-almasrymina@google.com>
2020-02-14 21:00 ` [PATCH] hugetlb: fix CONFIG_CGROUP_HUGETLB ifdefs Mina Almasry
2020-02-15 1:17 ` Randy Dunlap
2020-02-15 1:56 ` Randy Dunlap
2020-02-16 20:40 ` Mina Almasry
2020-02-16 21:03 ` Mina Almasry
2020-02-17 2:48 ` Randy Dunlap
2020-02-17 2:57 ` [PATCH] hugetlb: fix <linux/hugetlb_cgroup.h> structs Randy Dunlap
2020-02-17 3:53 ` [PATCH] hugetlb: fix CONFIG_CGROUP_HUGETLB ifdefs Stephen Rothwell
2020-02-14 16:49 ` mmotm 2020-02-13-22-26 uploaded (mm/migrate.c, hugetlb_cgroup.h) Randy Dunlap
2020-02-25 3:53 ` mmotm 2020-02-24-19-53 uploaded Andrew Morton
2020-02-25 6:16 ` mmotm 2020-02-24-19-53 uploaded (init/main.c: initrd*) Randy Dunlap
2020-02-25 6:18 ` Randy Dunlap
2020-02-25 6:21 ` Randy Dunlap
2020-02-25 16:41 ` mmotm 2020-02-24-19-53 uploaded (drivers/platform/x86/intel_pmc_core.c) Randy Dunlap
2020-02-25 17:01 ` mmotm 2020-02-24-19-53 uploaded (objtool warning) Randy Dunlap
2020-02-27 21:52 ` Josh Poimboeuf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200204013348.GxUmZtFO4%akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=adobriyan@gmail.com \
--cc=bob.picco@oracle.com \
--cc=dan.j.williams@intel.com \
--cc=daniel.m.jordan@oracle.com \
--cc=david@redhat.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mhocko@suse.com \
--cc=mm-commits@vger.kernel.org \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=osalvador@suse.de \
--cc=pasha.tatashin@oracle.com \
--cc=sfr@canb.auug.org.au \
--cc=stable@vger.kernel.org \
--cc=steven.sistare@oracle.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).