All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	stable@vger.kernel.org,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Pavel Tatashin <pasha.tatashin@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Steven Sistare <steven.sistare@oracle.com>,
	Michal Hocko <mhocko@suse.com>, Bob Picco <bob.picco@oracle.com>,
	Oscar Salvador <osalvador@suse.de>
Subject: Re: [PATCH v1 1/3] mm: fix uninitialized memmaps on a partially populated last section
Date: Tue, 10 Dec 2019 11:11:03 +0100	[thread overview]
Message-ID: <c0733e11-bf06-8813-11de-019cdbddef34@redhat.com> (raw)
In-Reply-To: <20191209211502.zhbvzv2qwbvcperm@ca-dmjordan1.us.oracle.com>

On 09.12.19 22:15, Daniel Jordan wrote:
> Hi David,
> 
> On Mon, Dec 09, 2019 at 06:48:34PM +0100, David Hildenbrand wrote:
>> If max_pfn is not aligned to a section boundary, we can easily run into
>> BUGs. This can e.g., be triggered on x86-64 under QEMU by specifying a
>> memory size that is not a multiple of 128MB (e.g., 4097MB, but also
>> 4160MB). I was told that on real HW, we can easily have this scenario
>> (esp., one of the main reasons sub-section hotadd of devmem was added).
>>
>> The issue is, that we have a valid memmap (pfn_valid()) for the
>> whole section, and the whole section will be marked "online".
>> pfn_to_online_page() will succeed, but the memmap contains garbage.
>>
>> E.g., doing a "cat /proc/kpageflags > /dev/null" results in
>>
>> [  303.218313] BUG: unable to handle page fault for address: fffffffffffffffe
>> [  303.218899] #PF: supervisor read access in kernel mode
>> [  303.219344] #PF: error_code(0x0000) - not-present page
>> [  303.219787] PGD 12614067 P4D 12614067 PUD 12616067 PMD 0
>> [  303.220266] Oops: 0000 [#1] SMP NOPTI
>> [  303.220587] CPU: 0 PID: 424 Comm: cat Not tainted 5.4.0-next-20191128+ #17
> 

Hi Daniel,

> I can't reproduce this on x86-64 qemu, next-20191128 or mainline, with either
> memory size.  What config are you using?  How often are you hitting it?

Thanks for verifying! Hah, there is one piece missing to reproduce via
"cat /proc/kpageflags > /dev/null" that I ignored on my QEMU cmdline (see below)

I can reproduce it reliably (QEMU with "-m 4160M") via

[root@localhost ~]# uname -a
Linux localhost 5.5.0-rc1-next-20191209 #93 SMP Tue Dec 10 10:46:19 CET 2019 x86_64 x86_64 x86_64 GNU/Linux
[root@localhost ~]# ./page-types -r -a 0x144001
[  200.476376] BUG: unable to handle page fault for address: fffffffffffffffe
[  200.477500] #PF: supervisor read access in kernel mode
[  200.478334] #PF: error_code(0x0000) - not-present page
[  200.479076] PGD 59614067 P4D 59614067 PUD 59616067 PMD 0 
[  200.479557] Oops: 0000 [#4] SMP NOPTI
[  200.479875] CPU: 0 PID: 603 Comm: page-types Tainted: G      D W         5.5.0-rc1-next-20191209 #93
[  200.480646] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu4
[  200.481648] RIP: 0010:stable_page_flags+0x4d/0x410
[  200.482061] Code: f3 ff 41 89 c0 48 b8 00 00 00 00 01 00 00 00 45 84 c0 0f 85 cd 02 00 00 48 8b 53 08 48 8b 2b 48f
[  200.483644] RSP: 0018:ffffb139401cbe60 EFLAGS: 00010202
[  200.484091] RAX: fffffffffffffffe RBX: fffffbeec5100040 RCX: 0000000000000000
[  200.484697] RDX: 0000000000000001 RSI: ffffffff9535c7cd RDI: 0000000000000246
[  200.485313] RBP: ffffffffffffffff R08: 0000000000000000 R09: 0000000000000000
[  200.485917] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000144001
[  200.486523] R13: 00007ffd6ba55f48 R14: 00007ffd6ba55f40 R15: ffffb139401cbf08
[  200.487130] FS:  00007f68df717580(0000) GS:ffff9ec77fa00000(0000) knlGS:0000000000000000
[  200.487804] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  200.488295] CR2: fffffffffffffffe CR3: 0000000135d48000 CR4: 00000000000006f0
[  200.488897] Call Trace:
[  200.489115]  kpageflags_read+0xe9/0x140
[  200.489447]  proc_reg_read+0x3c/0x60
[  200.489755]  vfs_read+0xc2/0x170
[  200.490037]  ksys_pread64+0x65/0xa0
[  200.490352]  do_syscall_64+0x5c/0xa0
[  200.490665]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

(tool located in tools/vm/page-types.c, see also patch #2)


To reproduce via "cat /proc/kpageflags > /dev/null", you have to
hot/coldplug one DIMM, to move max_pfn beyond the garbage memmap
(see also patch #2). My QEMU cmdline with Fedora 31:

qemu-system-x86_64 \
    --enable-kvm \
    -m 4160M,slots=4,maxmem=8G \
    -hda Fedora-Cloud-Base-31-1.9.x86_64.qcow2 \
    -machine pc \
    -nographic \
    -nodefaults \
    -chardev stdio,id=serial,signal=off \
    -device isa-serial,chardev=serial \
    -object memory-backend-ram,id=mem0,size=1024M \
    -device pc-dimm,id=dimm0,memdev=mem0

[root@localhost ~]# uname -a
Linux localhost 5.3.7-301.fc31.x86_64 #1 SMP Mon Oct 21 19:18:58 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[root@localhost ~]# cat /proc/kpageflags > /dev/null
[  111.517275] BUG: unable to handle page fault for address: fffffffffffffffe
[  111.517907] #PF: supervisor read access in kernel mode
[  111.518333] #PF: error_code(0x0000) - not-present page
[  111.518771] PGD a240e067 P4D a240e067 PUD a2410067 PMD 0 

> 
> It may not have anything to do with the config, and I may be getting lucky with
> the garbage in my memory.
> 

Some things that might be relevant from my config.

# CONFIG_PAGE_POISONING is not set
CONFIG_DEFERRED_STRUCT_PAGE_INIT=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_HAVE_MEMBLOCK_NODE_MAP=y
CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG_SPARSE=y
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y

The F31 default config should make it trigger.


Will update this patch description - thanks!

...

-- 
Thanks,

David / dhildenb


  reply	other threads:[~2019-12-10 10:11 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-09 17:48 [PATCH v1 0/3] mm: fix max_pfn not falling on section boundary David Hildenbrand
2019-12-09 17:48 ` [PATCH v1 1/3] mm: fix uninitialized memmaps on a partially populated last section David Hildenbrand
2019-12-09 21:15   ` Daniel Jordan
2019-12-10 10:11     ` David Hildenbrand [this message]
2019-12-10 22:18       ` Daniel Jordan
2019-12-09 17:48 ` [PATCH v1 2/3] fs/proc/page.c: allow inspection of last section and fix end detection David Hildenbrand
2019-12-10  0:46   ` kbuild test robot
2019-12-10  0:46     ` kbuild test robot
2019-12-10  1:04   ` kbuild test robot
2019-12-10  1:04     ` kbuild test robot
2019-12-10 10:53     ` David Hildenbrand
2019-12-10 10:53       ` David Hildenbrand
2019-12-09 17:48 ` [PATCH v1 3/3] mm: initialize memmap of unavailable memory directly David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c0733e11-bf06-8813-11de-019cdbddef34@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=bob.picco@oracle.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=osalvador@suse.de \
    --cc=pasha.tatashin@oracle.com \
    --cc=stable@vger.kernel.org \
    --cc=steven.sistare@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.